This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/Tooling/DependencyScanning/
-
clang/
-
Tooling/
-
DependencyScanning/
18/22
DependencyScanningFilesystem.h
-
lib/Tooling/DependencyScanning/
-
Tooling/
-
DependencyScanning/
13/25
DependencyScanningFilesystem.cpp
-
unittests/Tooling/
-
Tooling/
-
DependencyScannerTest.cpp

Differential D114966

[clang][deps] Ensure filesystem cache consistency
ClosedPublic

Authored by jansvoboda11 on Dec 2 2021, 9:29 AM.

Download Raw Diff

Details

Reviewers

Bigcheese
dexonsmith
arphaman

Commits

rG5daeada33051: [clang][deps] Ensure filesystem cache consistency

Summary

The minimizing filesystem used by the dependency scanner isn't great when it comes to the consistency of its caches. There are two problems that can be exposed by a filesystem that changes during dependency scan:

In-memory cache entries for original and minimized files are distinct, populated at different times using separate stat/open syscalls. This means that when a file is read with minimization disabled, its contents might be inconsistent when the same file is read with minimization enabled at later point (and vice versa).
In-memory cache entries are indexed by filename. This is problematic for symlinks, where the contents of the symlink might be inconsistent with contents of the original file (for the same reason as in problem 1).

This patch ensures consistency by always stating/reading a file exactly once. The original contents are always cached and minimized contents are derived from that on demand. The cache entries are now indexed by their UniqueID ensuring consistency for symlinks too. Moreover, the stat/read syscalls are now issued outside of critical section.

Depends on D115935.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

jansvoboda11 created this revision.Dec 2 2021, 9:29 AM

Herald added subscribers: dexonsmith, hiraditya. · View Herald TranscriptDec 2 2021, 9:29 AM

jansvoboda11 requested review of this revision.Dec 2 2021, 9:29 AM

Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptDec 2 2021, 9:29 AM

Herald added subscribers: llvm-commits, cfe-commits. · View Herald Transcript

jansvoboda11 added a child revision: D114968: [clang][deps] Avoid reading file for stat calls.Dec 2 2021, 9:38 AM

Add unit test, IWYU.

jansvoboda11 edited the summary of this revision. (Show Details)Dec 2 2021, 10:34 AM

jansvoboda11 added reviewers: Bigcheese, dexonsmith, arphaman.

Harbormaster completed remote builds in B137169: Diff 391369.Dec 2 2021, 10:40 AM

jansvoboda11 edited the summary of this revision. (Show Details)Dec 2 2021, 10:45 AM

jansvoboda11 edited the summary of this revision. (Show Details)Dec 2 2021, 11:54 AM

jansvoboda11 edited the summary of this revision. (Show Details)Dec 2 2021, 11:57 AM

jansvoboda11 edited the summary of this revision. (Show Details)

Thanks for working on this; seems like a great start. At a high-level:

We should check overhead. It'd be good to benchmark scanning LLVM with clang-scan-deps before and after this change.
The locking is getting harder to track, since the acquisition and release are disconnected. I'd rather use a pattern that kept this simple.
Identified a few pre-existing issues that might be exacerbated by (and/or harder to fix after) this refactor.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
138	I think this should be a StringRef (or MemoryBufferRef, which you can construct from two StringRefs, but given that the name is already in the `Status` object probably not useful).
148–149	Doesn't seem to be a good reason to save a null string. Just use a `StringRef()`.
148–150	I find these two typedefs a bit obfuscating. I see that they might provide some benefit in the patch as-is because of the imposed requirement that returned result uses a pointer to a `SmallString<1>`; as such it's important that the type be identical. Instead, it should use a `StringRef` to avoid depending on storage (already commented above). Even if not for that, it could/should use a `SmallVectorImpl<char>` to avoid imposing a specific requirement on the small size. Then `Contents` and `OriginalContents` can be skipped (the latter becoming `std::unique_ptr<llvm::MemoryBuffer>`, but without the obfuscation of a typedef).
153	This should be the `std::unique_ptr<MemoryBuffer>` from disk. There's no reason to `memcpy` it into a new allocation.
158	name/field match is a bit confusing. I'm not sure the typedef is buying much here.
213–214	You don't need the heavyweight std::map for reference validation. You can just use a `DenseMap<KeyT, std::unique_ptr<ValueT>>`. That's pretty expensive due to allocation traffic, but it's still cheaper than a `std::map`. But you can also avoid the allocation traffic by using a BumpPtrAllocator, the same pattern as the StringMap above. E.g.: llvm::SpecificBumpPtrAllocator<OriginalContents> OriginalContentsAlloc; llvm::DenseMap<llvm::sys::fs::UniqueID, OriginalContents > OriginalContentsCache; // insert into shard: OriginalContents &getOriginalContentContainer(...) { std::scoped_lock<std::mutex> L(CacheMutex); OriginalContents &OC = OriginalContents[UID]; if (!OC) OC = new (OriginalContentsAlloc) OriginalContents; return *OC; } // get original content: StringRef getOriginalContentBuffer(...) { OriginalContents &OC = getOriginalContentContainer(...); if (OC.IsInitialized) return OC->Content->getBuffer(); // Could put this after the lock I guess... std::unique_ptr<MemoryBuffer> Content = readFile(...); // check IsInitialized again after locking in case there's a race std::scoped_lock<std::mutex> L(SharedStat.Mutex); if (OC->IsInitialized) return OC->Content->getBuffer(); OC->Content = std::move(Content); OC->IsInitialized = true; return OC->Content->getBuffer(); } Same pattern for minimized content cache. Since the locks are only held briefly there's no need to pass them around and lose clarity about how long it's open. Also, IIRC, `std::unique_lock` is more expensive than `std::scoped_lock` (but my memory could be faulty).
215–217	I wonder if these should really be separate. Seems better to have something like: struct SharedContent { // Even better: unique_atomic_ptr<MemoryBuffer>, to enable lock-free access/updates. atomic<bool> HasOriginal; std::unique_ptr<MemoryBuffer> Original; // Even better: std::atomic<MinimizedContent >, with the latter bumpptrallocated, to // enable lock-free access/updates. atomic<bool> HasMinimized; SmallString<0> Minimized; // would be nice to bumpptrallocate this string... PreprocessorSkippedRangeMapping PPSkippedRangeMapping; }; SpecificBumpPtrAllocator<SharedCachedContent> ContentAlloc; DenseMap<llvm::sys::fs::UniqueID, SharedCachedContent > ContentCache; With that in place, seems like the `SharedStat` can have `std::atomic<SharedContent >`, which caches the result of the UID lookup. This way the UID `DenseMap` lookup is once per stat name, saving reducing contention on the per-shard lock. Then in the local cache, the only map storage would be: llvm::StringMap<SharedStat , llvm::BumpPtrAllocator> LocalStatCache; No need to duplicate the UID-keyed caches, since the lookups there would set the pointer for the SharedContent.
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
24–32	Is there a potential (already existing) race condition here? Can't the file change between the stat and opening the buffer? Seems like either: The `Stat` should be updated to have the observed size of the buffer. An error should be returned if the size doesn't match. The stat and/or read should be retried until they match.
63–65	This will introduce a memory regression in the common case where there are no PCHs. Previously, only minimized files were saved in memory. These are relatively small, so probably no big deal. Now, the original file is being saved as well. These are not small. Instead, the MemoryBuffer should be saved directly. For large files whose size isn't on a page boundary, this will be an `mmap`. This doesn't count against process memory because the kernel can optimize this easily, such as by sharing between processes (e.g., with actual compilation). For large files on page boundaries, there was already a memcpy done in order to make this null-terminated. No reason to do that again here. For small files, this is already a buffer on the heap... the extra memcpy and allocation probably doesn't matter all that much, but the large file case is worth optimizing for. This wasteful was already around when files weren't being minimized files, but it's going to use a lot more memory now that original files are stored even when they're going to be minimized.
248	I don't love the lack of clarity between when the lock is taken and when it's released caused by this being an out parameter. I don't have a specific suggestion, but maybe there's another way to factor the code overall?
249–251	Calling `readFile()` behind a lock doesn't seem great. I did confirm that the original code seems to do the same thing (lock outside of `createFilesystemEntry`), but this refactor seems to bake the pattern into a few more places. When races aren't very likely it's usually cheaper to: lock to check cache, returning cached result if so without a lock, compute result lock to set cache, but if the cache has been filled in the meantime by another thread, return that and throw out the just-computed one Maybe it'd be useful to add: std::atomic<bool> IsInitialized; to the MinimizedContents and OriginalContents structures stored in the shared cache. This could make it easier to decouple insertion in the shared cache from initialization. I.e., it'd be safe to release the lock while doing work; another thread won't think the default-constructed contents are correct.

In D114966#3168108, @dexonsmith wrote:

Thanks for working on this; seems like a great start.

Thanks a lot for the extensive feedback! I'll work through it and create prep patches where sensible. It seems like things can be simplified quite a bit.

At a high-level:

We should check overhead. It'd be good to benchmark scanning LLVM with clang-scan-deps before and after this change.

In my testing, this patch causes ~20% increase in memory usage.

The locking is getting harder to track, since the acquisition and release are disconnected. I'd rather use a pattern that kept this simple.

I agree. I think I'll explore the direction you suggested in one of your inline comments.

Identified a few pre-existing issues that might be exacerbated by (and/or harder to fix after) this refactor.

That makes sense.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
153	Fixed in new prep-patch: D115043.
213–214	I didn't think of using `SpecificBumpPtrAllocator` this way, seems really neat, thanks for the suggestion!
213–214	Yeah, `scoped_lock` should be cheaper, I'll create a prep-patch for that.
215–217	I really like idea of keeping a pointer to `SharedContent` in `SharedStat` and avoiding locking & lookup in the content caches. Merging original and minimized contents would probably simplify things quite a bit as well.
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
248	Fair point, I'll try to simplify this.
249–251	Could you expand on this a bit more? If we have a lock for each file, how is locking, reading, unlocking slower than locking, unlocking, reading, locking, unlocking?

dexonsmith added inline comments.Dec 6 2021, 6:12 PM

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
249–251	You're right; if there's a lock per-file and all consumers want the result of all computations there's no benefit to releasing the lock quickly. If some consumers only want partial results (or already-computed results), can be faster to release quickly. Could be expensive to have mutexes per-file, since that's A LOT of mutexes. It might be cheaper in aggregate to switch to lock-free here.

jansvoboda11 mentioned this in D115346: [clang][deps] Squash caches for original and minimized files.Dec 9 2021, 7:53 AM

jansvoboda11 added a child revision: D114971: [clang][deps] Handle symlinks in minimizing FS.Dec 14 2021, 2:07 AM

Rebase on top of D115346, apply suggested changes to the cache structure and allocation strategy.

jansvoboda11 edited the summary of this revision. (Show Details)Dec 15 2021, 10:16 AM

jansvoboda11 added a parent revision: D115346: [clang][deps] Squash caches for original and minimized files.

jansvoboda11 marked 3 inline comments as done.Dec 15 2021, 10:19 AM

jansvoboda11 added inline comments.

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
24–32	I think that's right. I left a detailed FIXME in the code calling `read()` and would like to tackle that in a follow up. Would that be fine?

jansvoboda11 retitled this revision from [clang][deps] Split filesystem caches to [clang][deps] Split stat and file contents caches.Dec 15 2021, 10:19 AM

jansvoboda11 retitled this revision from [clang][deps] Split stat and file contents caches to [clang][deps] Split stat and file content caches.

Harbormaster completed remote builds in B139466: Diff 394600.Dec 15 2021, 10:56 AM

dexonsmith added inline comments.Dec 15 2021, 3:03 PM

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
24–32	Yup, doing it in a separate commit makes sense. I suggest taking the first option, since it's the simplest.

dexonsmith added inline comments.Dec 15 2021, 4:00 PM

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
257–267	I'm not quite following this logic. I think it's safe (and important!) to modify MaybeStat if `read()` fails. We're in a critical section that either creates and partially initializes the entry, or incrementally updates it. In the "creates and partially initializes" case: All other workers will get nullptr for `Cache.getEntry()` and try to enter the critical section. We have just seen a successful MaybeStat value. `needsRead()` will be `true` since we have not read contents before. We will immediately try to read. `read()` should open the original contents and can safely: on success, update the value for MaybeStat to match the observed size on failure, drop the value and set the error for MaybeStat to the observed error When we leave the critical section, either: MaybeStat stores an error; no thread will enter the critical section again OriginalContents are initialized and `needsRead()` returns false In the "incrementally updates" case: `needsRead()` returns false so `read()` will not be called
268	Seems like the existing stat value should be passed into `read()` and the second stat there removed.

dexonsmith added inline comments.Dec 15 2021, 8:07 PM

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp

123–124

This name is a bit misleading... looks more like getOrCreateFileContents() to me.

257–267

The key property being the last bullet of the first case: that the "create and partially initializes" case guarantees that either MaybeStat stores an error (isReadable() is false) or OriginalContents is initialized (needsRead() returns false).

I think I've found the part I was missing: that this critical section is for a SharedCacheFileEntry (associated with a filename), but the OriginalContents is a field on a CachedFileContents which could apply to other UIDs (and SharedCache.getFileContents() is actually "get or create"). Since this commit creates the UID map, seems like maybe the race gets worse in this commit? (Not sure)

Another problem: the UID can change between the first stat above (call to getUnderlyingFS().status(Filename)) and the one inside read() if another process is writing at the same time. We can't trust the UID mapping from the first status() call unless content already exists for that UID.

I think to avoid this race you need to delay creating "UID to content" map entry until there is the result of a successful read() to store.

I'll describe an algorithm that I think is fairly clean that handles this. I'm using different data structure names to avoid confusion since I've broken it down a little differently:

ReadResult: stat (for directories) OR error and uid (failed read) OR stat and content (and optional minimized content and pp ranges, and a way to update them atomically))
FilenameMap: map from Filename to ReadResult* (shared and sharded; mirrored locally in each worker)
UIDMap: map from UID to ReadResult* (shared and sharded; probably no local mirror)

And here's the algorithm:

// Top-level API: get the entry/result for some filename.
ErrorOr<ReadResult &> getOrCreateResult(StringRef Filename) {
  if (ReadResult *Result = lookupEntryForFilename(Filename))
    return minimizeIfNecessary(*Result, ShouldMinimize);
  if (ErrorOr<ReadResult &> Result = computeAndStoreResult(Filename))
    return minimizeIfNecessary(*Result, ShouldMinimize);
  else
    return Result.getError();
}
// Compute and store an entry/result for some filename. Returned result
// has in-sync stat+read info (assuming read was successful).
ErrorOr<ReadResult &> computeAndStoreResult(StringRef Filename) {
  ErrorOr<Status> Stat = UnderlyingFS->status(Filename);
  if (!Stat)
    return Stat.getError(); // Can't cache missing files.
  if (ReadResult *Result = lookupEntryForUID(Stat->UID))
    return storeFilenameEntry(Filename, *Result); // UID already known.
  // UID not known. Compute a ReadResult.
  //
  // Unless this is a directory (where we don't need to go back to the FS),
  // ignore existing 'Stat' because without an open file descriptor the UID
  // could change.
  Optional<ReadResult> Result;
  if (Stat->isDirectory())
    Result = ReadResult(*Stat);
  else if (ErrorOr<ReadResult> MaybeResult = computeReadResult(Filename))
    Result = std::move(*MaybeResult);
  else
    return MaybeResult.getError(); // File disappeared...
  // Store the result. Cascade through UID then Filename. Each level could
  // return a different result than it was passed in.
  return storeEntryForFilenameOrReturnExisting(Filename,
             storeEntryForUIDOrReturnExisting(std::move(*Result));
}
// Lookup existing result in FilenameMap. No mutation. First checks local map
// then falls back to the shared map (locks shard, lookup, unlocks, saves in
// local map, returns).
ReadResult *lookupEntryForFilename(StringRef Filename);
// Lookup existing result in UIDMap. No mutation. No local map, just a shared
// map (lockshard+lookup+return).
ReadResult *lookupEntryForUID(UniqueID);
// Compute read result using a single file descriptor.
// - Return error if `open()` fails. Can't cache missing files.
// - Else compute ReadResult: Stat open file descriptor and get a memory buffer from it.
// Note: "Error" state if stat fails.
// Note: "Error" state if stat succeeds and memory buffer does not open.
// Note: if the memory buffer opens successfully, status updated with observed size.
// Note: does not take a UID parameter since live FS could have changed.
// Note: does not access or mutate UIDMap/FilenameMap/etc.
ErrorOr<ReadResult> computeReadResult(StringRef Filename);
// Compare-exchange. Pulls UID out of NewResult. Locks shard for UIDMap[UID]; checks for
// existing result; if none, bump-ptr-allocates and stores NewResult; returns stored
// result.
ReadResult& storeEntryForUIDOrReturnExisting(ReadResult &&NewResult);
// Compare-exchange. Locks shard for FilenameMap[Filename]; checks for existing result;
// if none, stores parameter; unlocks; updates local map with stored result and returns
// it.
ReadResult& storeEntryForFilenameOrReturnExisting(StringRef Filename, ReadResult &);
// If needed and missing, adds minimization info atomically. Note that Result
// may store a cached read error, or a directory.
ReadResult& minimizeIfNecessary(ReadResult& Result, bool ShouldMinimize);

The only thing "lost" is that two workers might both compute a ReadResult for the same file (the slower one having the work dropped on the floor). I'm skeptical this will matter in practice. If some measurement says it does, the FilenameMap could map from Filename to unique_ptr<pair<mutex,atomic<ReadResult*>>> and computeAndStoreResult() could take a lock at the start and re-check the map... but IMO it's better to make this simple and optimize for the common case.

This does leave behind an extended critical section in minimizeIfNecessary() to avoid racing to minimize, implying there's a mutex in ReadResult for updating the MinimizedContents and PPRanges. (I suspect minimization is fast enough (and racing workers rare and cheap enough) that the memory overhead of a per-ReadResult mutex is a bad tradeoff (a simple alternative would be to computeMinimized before the critical section, then take out a "store-minimization" lock just for setting the values (mutex could be sharded by the UID % 64 or something), but I'm less confident.)

This also leaves behind the double-stat behaviour (before and after open) for new files. New files should be pretty rare in the depscanner so maybe this is fine; I've also observed cases where failed open is slower than failed stat, so for the common case of missing files (which don't get cached...) this might be best.

jansvoboda11 added inline comments.Dec 16 2021, 8:45 AM

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
257–267	This patch (and D115346) were motivated by D114971, which prevents minimization of symlinks that point to files referenced by precompiled dependencies (e.g. a PCH). When the dependency scanning worker disables minimization of a file referenced by a precompiled dependency, my idea was to immediately "canonicalize" the filename to `UniqueID`s (through stat) and use that when deciding whether to minimize its contents in `getOrCreateFileSystemEntry`. With that approach, the "filename -> `UniqueID`" map/cache acts as the authority for stat information. However, I can see how this breaks when the FS is volatile. Besides the issue outlined in your last comment, the current approach in D114971 prolongs the pause between initial stat (when disabling file minimization - "configure time") and read ("query time"), increasing the chances for observing filesystem volatility. I think your approach makes a lot of sense if we want to be really defensive against volatile FS. Making sure we don't have to re-stat or read files that were already stat-ed during "configuration" or previous "query" would be nice. I think that unfortunately means we need to actually read all input files of precompiled dependencies in D114971. Just stat-ing such files is no longer an option, since that would get us back to square one if we need to later read them (they might've changed). I have implemented your idea locally, and will update this patch tomorrow.

Implement new version that ensures the stat and contents of a file are always in sync.

Harbormaster completed remote builds in B139825: Diff 395102.Dec 17 2021, 5:34 AM

jansvoboda11 edited the summary of this revision. (Show Details)Dec 17 2021, 5:34 AM

jansvoboda11 added a parent revision: D115935: [clang][deps] NFC: Simplify handling of cached FS errors.

Re-run CI.

Harbormaster completed remote builds in B139826: Diff 395103.Dec 17 2021, 6:10 AM

jansvoboda11 added inline comments.Dec 17 2021, 8:01 AM

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
272	I'm not sure these should be separate. We could end up in situation where the Filename map contains different entry than the UID map for the same directory entry. I'm tempted to merge these functions into one and perform the updates in a single critical section...

I'm liking the new direction here; requesting changes since it looks like the filename is being used to pick the shard for UIDMap, which will lead to multiple opinions of what each UID means.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
335	I'd use `const&` to avoid copying the string on the way in... see below.
337	I think, rather than move/copy the status name, the name should be wiped out to ensure no one relies on it. Every access should use `copyWithNewName()` since this is shared across all things that point to the same UID... so let's use `copyWithNewName()` here to drop the ignored name.
362–363	This doesn't look right to me. UIDs should be sharded independently of the filename they happen to have been reached by; otherwise each filename shard is developing its own idea of what each UID means. Since UID distribution is not uniform, probably the UID shard should be chosen by `hash_value(Stat.getUniqueID()) % NumShards`. You could use the same sets of shards for UIDMap and FilenameMap, but since they're independent I'd probably do: UIDCache: sharded by UID: UIDMap and BumpPtrAllocator for entries (and likely anything else tied to content) FilenameCache: sharded by filename: FilenameMap (and perhaps other things tied to filename?)
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
23–24	In what circumstances should this return a cached-error TentativeEntry? Any?
28–29	Since the file was opened, should we return cached-error TentativeEntry here, rather than an error?
33–34	After a successful stat on the same file descriptor, it definitely feels like this is an error that should be cached, and a TentativeEntry that is in an error state should be returned.
272	I'm not sure these should be separate. We could end up in situation where the Filename map contains different entry than the UID map for the same directory entry. I'm also sure precisely what you mean by "for the same directory entry" in this context; and I don't see what's wrong with the situation I think you're outlining. I'm tempted to merge these functions into one and perform the updates in a single critical section... A single critical section for setting UID and filename at the same time would be hard to get right (and efficient), since UIDs have aliases through other filenames due to different directory paths (dir/../x.h vs x.h) and filesystem links (hard and symbolic). Here's the race that I think(?) you're worried about: Worker1 does a tentative stat of "x.h", finds a UID that isn't mapped (UIDX1, but it's ignored...). Worker2 does a tentative stat of "x.h", finds a UID that isn't mapped (UIDX1, but it's ignored...). Worker1 opens "x.h", finds ContentX1+StatX1 (with UIDX1), saves mapping UIDX1 -> ContentX1+StatX1. "x.h" changes. Worker2 opens "x.h", finds ContentX2+StatX2 (with UIDX2), saves mapping UIDX2 -> ContentX2+StatX2. Worker2 saves mapping "x.h" -> ContentX2+StatX2. Both workers move forward with "x.h" -> ContentX2+StatX2. IIUC, you're concerned that the mapping UIDX1 -> ContentX1+StatX1 was saved. The side effect is that if a future tentative stat of (e.g.) "y.h" returns UIDX1, then "y.h" will be mapped to ContentX1+StatX1. Is this what concerns you? Why? (Is it something else?) The concern I have is that some filesystems recycle UIDs (maybe "x.h" was a symbolic link to "y.h" and then became its own file... or maybe "x.h" and "y.h" were hard links... or maybe "y.h" is just a new file!). But that's a problem with using UIDs to detect equivalent filesystem links / content in general. I don't see any reason to be more concerned here than elsewhere, and to avoid depending on UID we'd need a pretty different design (e.g., lazily detect and model directory structure and symbolic links).

This revision now requires changes to proceed.Dec 20 2021, 10:31 AM

jansvoboda11 added inline comments.Dec 21 2021, 7:55 AM

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
362–363	Hmm, skimming through `RealFileSystem::status`, I saw that it's calling `sys::fs::status` with "follow symlinks" enabled. It made sense to me that the name stored in `llvm::vfs::Status` would match that and refer the fully resolved target entry, not the symlink itself. Seeing as this is not the case, I agree the UID itself should be used for choosing the shard.
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
23–24	I don't think the distinction matters at this level. Whether failures should be cached is a decision that's being made one level up. I personally prefer having `TentativeEntry` to be "non-fallible" and explicitly wrapping the whole thing in `ErrorOr`. That makes it easier for others to know what they are working with (i.e. this object cannot represent an error state). Eventually, I think this would make sense for the caches and `CachedFileSystemEntry` too.
28–29	Let's return an error here and create error `CachedFileSystemEntry` one level up.
272	Yes, that's the kind of scenario I was thinking about. I'm not concerned about consequences of that side effect, I just don't like storing garbage that will most likely never be used/referenced again and might be confusing during debugging. I agree with you on UID recycling...

Erase filenames in temporary Stat objects, use UniqueID as shard key where appropriate.

Harbormaster completed remote builds in B140260: Diff 395684.Dec 21 2021, 8:37 AM

Okay, I think this is the last round. Everything looks correct, except a few out-of-date comments.

Two things, one small, one bigger (but not too big I think).

Smaller one is that there's an unnecessary string copy when getting the status.

Bigger one is that I think CacheShard::ContentsCache can be deleted, since the map is fully redundant with CacheShard::EntriesByUID (most of the inline comments are about that). This is because they are both indexed by UID and created at the same time (two immediately adjacent locks of the same shard mutex with no computation in between). Relatedly, CachedFileSystemEntry::ContentsAccess is never modified after construction. Seems reasonable to keep a separate allocation for the CachedFileContent since it's fairly big and directories (and cached errors) don't need them, but the CachedFileSystemEntry can just point at a raw pointer whose value is never expected to change and we don't need the separate map.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
45–56	This should mention that it's shared across different filenames, only one per UID now, and the filename is empty in the stat.
139–145	This comment is out of date, since there's a 1:1 correspondence between CachedFileSystemEntry and CachedFileContents since they're both looked up by UID. Also, `ContentsAccess` is never modified after construction so it can just be a raw pointer. Outlining the allocation of `CachedFileContents` might still make sense, since it's big and stat failures (with no content) are probably the common case due to header search patterns... only if we actually create these for stat failures though. Might be worth a comment saying why it's outlined. Maybe it should even be delayed to a follow-up commit to simplify this patch, since now that CachedFileSystemEntry is per-UID it doesn't seem to be required here... but the fields probably need to be made `mutable` regardless somehow so it doesn't seem like a ton of churn.
170–171	I think this map can be deleted, since it's not actually used to deduplicate/share anything that `EntriesByUID` doesn't handle.
203–208	I think this can be removed / merged with `getOrEmplaceEntryForUID()`, based on how it's used.
256–259	I think there's a new/unnecessary std::string copy in the case where `copyWithNewSize()` happens. If the size were fixed first that could be avoided: auto Stat = Entry.getStatus(); if (!Stat.isDirectory()) Stat = copyWithNewSize(...); return copyWithNewName(Stat, Filename);
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
161–171	Looks like the two UID maps are always filled directly after each other. Seems like we can reduce lookups like this.
181–184	I think this can be merged into getOrEmplaceEntryForUID.
228–235	I think this can be one call since they're taking the same lock and always done one after the other.

This revision now requires changes to proceed.Jan 12 2022, 6:03 PM

Update comments, remove std::atomic<>, merge getOrEmplaceContentsForUID into getOrEmplaceEntryForUID.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
139–145	I've updated the comment and changed `ContentsAccess` to a raw pointer. I agree outlining `CachedFileContents` in a follow-up patch would be cleaner, but I don't have much time to spare on prettifying git history at this moment unfortunately.
170–171	Yeah, that's right. Removed this in the latest revision.

Harbormaster completed remote builds in B144246: Diff 401171.Jan 19 2022, 5:44 AM

LGTM.

This revision is now accepted and ready to land.Jan 19 2022, 1:29 PM

jansvoboda11 retitled this revision from [clang][deps] Split stat and file content caches to [clang][deps] Ensure filesystem cache consistency.Jan 21 2022, 4:02 AM

jansvoboda11 edited the summary of this revision. (Show Details)

This revision was landed with ongoing or failed builds.Jan 21 2022, 4:04 AM

Closed by commit rG5daeada33051: [clang][deps] Ensure filesystem cache consistency (authored by jansvoboda11). · Explain Why

This revision was automatically updated to reflect the committed changes.

jansvoboda11 added a commit: rG5daeada33051: [clang][deps] Ensure filesystem cache consistency.

Revision Contents

Path

Size

clang/

include/

clang/

Tooling/

DependencyScanning/

DependencyScanningFilesystem.h

302 lines

lib/

Tooling/

DependencyScanning/

DependencyScanningFilesystem.cpp

224 lines

unittests/

Tooling/

DependencyScannerTest.cpp

4 lines

Diff 401934

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h

Show All 16 Lines
#include "llvm/Support/ErrorOr.h"		#include "llvm/Support/ErrorOr.h"
#include "llvm/Support/VirtualFileSystem.h"		#include "llvm/Support/VirtualFileSystem.h"
#include <mutex>		#include <mutex>

namespace clang {		namespace clang {
namespace tooling {		namespace tooling {
namespace dependencies {		namespace dependencies {

		/// Original and minimized contents of a cached file entry. Single instance can
		/// be shared between multiple entries.
		struct CachedFileContents {
		CachedFileContents(std::unique_ptr<llvm::MemoryBuffer> Original)
		: Original(std::move(Original)), MinimizedAccess(nullptr) {}

		/// Owning storage for the minimized contents.
		std::unique_ptr<llvm::MemoryBuffer> Original;

		/// The mutex that must be locked before mutating minimized contents.
		std::mutex ValueLock;
		/// Owning storage for the minimized contents.
		std::unique_ptr<llvm::MemoryBuffer> MinimizedStorage;
		/// Accessor to the minimized contents that's atomic to avoid data races.
		std::atomic<llvm::MemoryBuffer *> MinimizedAccess;
		/// Skipped range mapping of the minimized contents.
		/// This is initialized iff `MinimizedAccess != nullptr`.
		PreprocessorSkippedRangeMapping PPSkippedRangeMapping;
		};

/// An in-memory representation of a file system entity that is of interest to		/// An in-memory representation of a file system entity that is of interest to
/// the dependency scanning filesystem.		/// the dependency scanning filesystem.
///		///
/// It represents one of the following:		/// It represents one of the following:
/// - opened file with original contents and a stat value,		/// - opened file with original contents and a stat value,
/// - opened file with original contents, minimized contents and a stat value,		/// - opened file with original contents, minimized contents and a stat value,
/// - directory entry with its stat value,		/// - directory entry with its stat value,
/// - filesystem error,		/// - filesystem error.
/// - uninitialized entry with unknown status.		///
		/// Single instance of this class can be shared across different filenames (e.g.
		/// a regular file and a symlink). For this reason the status filename is empty
		/// and is only materialized by \c EntryRef that knows the requested filename.
		dexonsmithUnsubmitted Done Reply Inline Actions This should mention that it's shared across different filenames, only one per UID now, and the filename is empty in the stat. dexonsmith: This should mention that it's shared across different filenames, only one per UID now, and the…
class CachedFileSystemEntry {		class CachedFileSystemEntry {
public:		public:
/// Creates an uninitialized entry.		/// Creates an entry without contents: either a filesystem error or
CachedFileSystemEntry()		/// a directory with stat value.
: MaybeStat(llvm::vfs::Status()), MinimizedContentsAccess(nullptr) {}		CachedFileSystemEntry(llvm::ErrorOr<llvm::vfs::Status> Stat)
		: MaybeStat(std::move(Stat)), Contents(nullptr) {
/// Initialize the cached file system entry.		clearStatName();
void init(llvm::ErrorOr<llvm::vfs::Status> &&MaybeStatus, StringRef Filename,		}
llvm::vfs::FileSystem &FS);

/// Initialize the entry as file with minimized or original contents.		/// Creates an entry representing a file with contents.
///		CachedFileSystemEntry(llvm::ErrorOr<llvm::vfs::Status> Stat,
/// The filesystem opens the file even for `stat` calls open to avoid the		CachedFileContents *Contents)
/// issues with stat + open of minimized files that might lead to a		: MaybeStat(std::move(Stat)), Contents(std::move(Contents)) {
/// mismatching size of the file.		clearStatName();
llvm::ErrorOr<llvm::vfs::Status> initFile(StringRef Filename,
llvm::vfs::FileSystem &FS);

/// Minimize contents of the file.
void minimizeFile();

/// \returns True if the entry is initialized.
bool isInitialized() const {
return !MaybeStat \|\| MaybeStat->isStatusKnown();
}		}

/// \returns True if the entry is a filesystem error.		/// \returns True if the entry is a filesystem error.
bool isError() const { return !MaybeStat; }		bool isError() const { return !MaybeStat; }

/// \returns True if the current entry points to a directory.		/// \returns True if the current entry represents a directory.
bool isDirectory() const { return !isError() && MaybeStat->isDirectory(); }		bool isDirectory() const { return !isError() && MaybeStat->isDirectory(); }

/// \returns Original contents of the file.		/// \returns Original contents of the file.
StringRef getOriginalContents() const {		StringRef getOriginalContents() const {
assert(isInitialized() && "not initialized");
assert(!isError() && "error");		assert(!isError() && "error");
assert(!MaybeStat->isDirectory() && "not a file");		assert(!MaybeStat->isDirectory() && "not a file");
assert(OriginalContents && "not read");		assert(Contents && "contents not initialized");
return OriginalContents->getBuffer();		return Contents->Original->getBuffer();
}		}

/// \returns Minimized contents of the file.		/// \returns Minimized contents of the file.
StringRef getMinimizedContents() const {		StringRef getMinimizedContents() const {
assert(isInitialized() && "not initialized");
assert(!isError() && "error");		assert(!isError() && "error");
assert(!isDirectory() && "not a file");		assert(!MaybeStat->isDirectory() && "not a file");
llvm::MemoryBuffer *Buffer = MinimizedContentsAccess.load();		assert(Contents && "contents not initialized");
		llvm::MemoryBuffer *Buffer = Contents->MinimizedAccess.load();
assert(Buffer && "not minimized");		assert(Buffer && "not minimized");
return Buffer->getBuffer();		return Buffer->getBuffer();
}		}

/// \returns True if this entry represents a file that can be read.
bool isReadable() const { return MaybeStat && !MaybeStat->isDirectory(); }

/// \returns True if this cached entry needs to be updated.
bool needsUpdate(bool ShouldBeMinimized) const {
return isReadable() && needsMinimization(ShouldBeMinimized);
}

/// \returns True if the contents of this entry need to be minimized.
bool needsMinimization(bool ShouldBeMinimized) const {
return ShouldBeMinimized && !MinimizedContentsAccess.load();
}

/// \returns The error.		/// \returns The error.
std::error_code getError() const {		std::error_code getError() const { return MaybeStat.getError(); }
assert(isInitialized() && "not initialized");
return MaybeStat.getError();
}

/// \returns The entry status.		/// \returns The entry status with empty filename.
llvm::vfs::Status getStatus() const {		llvm::vfs::Status getStatus() const {
assert(isInitialized() && "not initialized");
assert(!isError() && "error");		assert(!isError() && "error");
		assert(MaybeStat->getName().empty() && "stat name must be empty");
return *MaybeStat;		return *MaybeStat;
}		}

/// \returns the name of the file.		/// \returns The unique ID of the entry.
StringRef getName() const {		llvm::sys::fs::UniqueID getUniqueID() const {
assert(isInitialized() && "not initialized");
assert(!isError() && "error");		assert(!isError() && "error");
return MaybeStat->getName();		return MaybeStat->getUniqueID();
}		}

/// Return the mapping between location -> distance that is used to speed up		/// \returns The mapping between location -> distance that is used to speed up
/// the block skipping in the preprocessor.		/// the block skipping in the preprocessor.
const PreprocessorSkippedRangeMapping &getPPSkippedRangeMapping() const {		const PreprocessorSkippedRangeMapping &getPPSkippedRangeMapping() const {
assert(!isError() && "error");		assert(!isError() && "error");
assert(!isDirectory() && "not a file");		assert(!isDirectory() && "not a file");
return PPSkippedRangeMapping;		assert(Contents && "contents not initialized");
		return Contents->PPSkippedRangeMapping;
		}

		/// \returns The data structure holding both original and minimized contents.
		CachedFileContents *getContents() const {
		assert(!isError() && "error");
		assert(!isDirectory() && "not a file");
		return Contents;
}		}

private:		private:
llvm::ErrorOr<llvm::vfs::Status> MaybeStat;		void clearStatName() {
std::unique_ptr<llvm::MemoryBuffer> OriginalContents;		if (MaybeStat)
		MaybeStat = llvm::vfs::Status::copyWithNewName(*MaybeStat, "");
		}

/// Owning storage for the minimized file contents.		/// Either the filesystem error or status of the entry.
std::unique_ptr<llvm::MemoryBuffer> MinimizedContentsStorage;		/// The filename is empty and only materialized by \c EntryRef.
/// Atomic view of the minimized file contents.		llvm::ErrorOr<llvm::vfs::Status> MaybeStat;
/// This prevents data races when multiple threads call `needsMinimization`.
std::atomic<llvm::MemoryBuffer *> MinimizedContentsAccess;

		dexonsmithUnsubmitted Done Reply Inline Actions I think this should be a StringRef (or MemoryBufferRef, which you can construct from two StringRefs, but given that the name is already in the `Status` object probably not useful). dexonsmith: I think this should be a StringRef (or MemoryBufferRef, which you can construct from two…
PreprocessorSkippedRangeMapping PPSkippedRangeMapping;		/// Non-owning pointer to the file contents.
		///
		/// We're using pointer here to keep the size of this class small. Instances
		/// representing directories and filesystem errors don't hold any contents
		/// anyway.
		CachedFileContents *Contents;
};		};
		dexonsmithUnsubmitted Not Done Reply Inline Actions This comment is out of date, since there's a 1:1 correspondence between CachedFileSystemEntry and CachedFileContents since they're both looked up by UID. Also, `ContentsAccess` is never modified after construction so it can just be a raw pointer. Outlining the allocation of `CachedFileContents` might still make sense, since it's big and stat failures (with no content) are probably the common case due to header search patterns... only if we actually create these for stat failures though. Might be worth a comment saying why it's outlined. Maybe it should even be delayed to a follow-up commit to simplify this patch, since now that CachedFileSystemEntry is per-UID it doesn't seem to be required here... but the fields probably need to be made `mutable` regardless somehow so it doesn't seem like a ton of churn. dexonsmith: This comment is out of date, since there's a 1:1 correspondence between CachedFileSystemEntry…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions I've updated the comment and changed `ContentsAccess` to a raw pointer. I agree outlining `CachedFileContents` in a follow-up patch would be cleaner, but I don't have much time to spare on prettifying git history at this moment unfortunately. jansvoboda11: I've updated the comment and changed `ContentsAccess` to a raw pointer. I agree outlining…

/// This class is a shared cache, that caches the 'stat' and 'open' calls to the		/// This class is a shared cache, that caches the 'stat' and 'open' calls to the
/// underlying real file system. It distinguishes between minimized and original		/// underlying real file system. It distinguishes between minimized and original
/// files.		/// files.
		dexonsmithUnsubmitted Done Reply Inline Actions Doesn't seem to be a good reason to save a null string. Just use a `StringRef()`. dexonsmith: Doesn't seem to be a good reason to save a null string. Just use a `StringRef()`.
///		///
		dexonsmithUnsubmitted Done Reply Inline Actions I find these two typedefs a bit obfuscating. I see that they might provide some benefit in the patch as-is because of the imposed requirement that returned result uses a pointer to a `SmallString<1>`; as such it's important that the type be identical. Instead, it should use a `StringRef` to avoid depending on storage (already commented above). Even if not for that, it could/should use a `SmallVectorImpl<char>` to avoid imposing a specific requirement on the small size. Then `Contents` and `OriginalContents` can be skipped (the latter becoming `std::unique_ptr<llvm::MemoryBuffer>`, but without the obfuscation of a typedef). dexonsmith: I find these two typedefs a bit obfuscating. I see that they might provide some benefit in the…
/// It is sharded based on the hash of the key to reduce the lock contention for		/// It is sharded based on the hash of the key to reduce the lock contention for
/// the worker threads.		/// the worker threads.
class DependencyScanningFilesystemSharedCache {		class DependencyScanningFilesystemSharedCache {
		dexonsmithUnsubmitted Done Reply Inline Actions This should be the `std::unique_ptr<MemoryBuffer>` from disk. There's no reason to `memcpy` it into a new allocation. dexonsmith: This should be the `std::unique_ptr<MemoryBuffer>` from disk. There's no reason to `memcpy` it…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Fixed in new prep-patch: D115043. jansvoboda11: Fixed in new prep-patch: D115043.
public:		public:
struct SharedFileSystemEntry {		struct CacheShard {
std::mutex ValueLock;		/// The mutex that needs to be locked before mutation of any member.
CachedFileSystemEntry Value;		mutable std::mutex CacheLock;

		dexonsmithUnsubmitted Done Reply Inline Actions name/field match is a bit confusing. I'm not sure the typedef is buying much here. dexonsmith: name/field match is a bit confusing. I'm not sure the typedef is buying much here.
		/// Map from filenames to cached entries.
		llvm::StringMap<const CachedFileSystemEntry *, llvm::BumpPtrAllocator>
		EntriesByFilename;

		/// Map from unique IDs to cached entries.
		llvm::DenseMap<llvm::sys::fs::UniqueID, const CachedFileSystemEntry *>
		EntriesByUID;

		/// The backing storage for cached entries.
		llvm::SpecificBumpPtrAllocator<CachedFileSystemEntry> EntryStorage;

		/// The backing storage for cached contents.
		llvm::SpecificBumpPtrAllocator<CachedFileContents> ContentsStorage;
		dexonsmithUnsubmitted Done Reply Inline Actions I think this map can be deleted, since it's not actually used to deduplicate/share anything that `EntriesByUID` doesn't handle. dexonsmith: I think this map can be deleted, since it's not actually used to deduplicate/share anything…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Yeah, that's right. Removed this in the latest revision. jansvoboda11: Yeah, that's right. Removed this in the latest revision.

		/// Returns entry associated with the filename or nullptr if none is found.
		const CachedFileSystemEntry *findEntryByFilename(StringRef Filename) const;

		/// Returns entry associated with the unique ID or nullptr if none is found.
		const CachedFileSystemEntry *
		findEntryByUID(llvm::sys::fs::UniqueID UID) const;

		/// Returns entry associated with the filename if there is some. Otherwise,
		/// constructs new one with the given status, associates it with the
		/// filename and returns the result.
		const CachedFileSystemEntry &
		getOrEmplaceEntryForFilename(StringRef Filename,
		llvm::ErrorOr<llvm::vfs::Status> Stat);

		/// Returns entry associated with the unique ID if there is some. Otherwise,
		/// constructs new one with the given status and contents, associates it
		/// with the unique ID and returns the result.
		const CachedFileSystemEntry &
		getOrEmplaceEntryForUID(llvm::sys::fs::UniqueID UID, llvm::vfs::Status Stat,
		std::unique_ptr<llvm::MemoryBuffer> Contents);

		/// Returns entry associated with the filename if there is some. Otherwise,
		/// associates the given entry with the filename and returns it.
		const CachedFileSystemEntry &
		getOrInsertEntryForFilename(StringRef Filename,
		const CachedFileSystemEntry &Entry);
};		};

DependencyScanningFilesystemSharedCache();		DependencyScanningFilesystemSharedCache();

/// Returns a cache entry for the corresponding key.		/// Returns shard for the given key.
///		CacheShard &getShardForFilename(StringRef Filename) const;
/// A new cache entry is created if the key is not in the cache. This is a		CacheShard &getShardForUID(llvm::sys::fs::UniqueID UID) const;
/// thread safe call.
SharedFileSystemEntry &get(StringRef Key);

private:		private:
struct CacheShard {
std::mutex CacheLock;
llvm::StringMap<SharedFileSystemEntry, llvm::BumpPtrAllocator> Cache;
};
std::unique_ptr<CacheShard[]> CacheShards;		std::unique_ptr<CacheShard[]> CacheShards;
		dexonsmithUnsubmitted Done Reply Inline Actions I think this can be removed / merged with `getOrEmplaceEntryForUID()`, based on how it's used. dexonsmith: I think this can be removed / merged with `getOrEmplaceEntryForUID()`, based on how it's used.
unsigned NumShards;		unsigned NumShards;
};		};

/// This class is a local cache, that caches the 'stat' and 'open' calls to the		/// This class is a local cache, that caches the 'stat' and 'open' calls to the
/// underlying real file system. It distinguishes between minimized and original		/// underlying real file system. It distinguishes between minimized and original
/// files.		/// files.
		dexonsmithUnsubmitted Done Reply Inline Actions You don't need the heavyweight std::map for reference validation. You can just use a `DenseMap<KeyT, std::unique_ptr<ValueT>>`. That's pretty expensive due to allocation traffic, but it's still cheaper than a `std::map`. But you can also avoid the allocation traffic by using a BumpPtrAllocator, the same pattern as the StringMap above. E.g.: llvm::SpecificBumpPtrAllocator<OriginalContents> OriginalContentsAlloc; llvm::DenseMap<llvm::sys::fs::UniqueID, OriginalContents > OriginalContentsCache; // insert into shard: OriginalContents &getOriginalContentContainer(...) { std::scoped_lock<std::mutex> L(CacheMutex); OriginalContents &OC = OriginalContents[UID]; if (!OC) OC = new (OriginalContentsAlloc) OriginalContents; return OC; } // get original content: StringRef getOriginalContentBuffer(...) { OriginalContents &OC = getOriginalContentContainer(...); if (OC.IsInitialized) return OC->Content->getBuffer(); // Could put this after the lock I guess... std::unique_ptr<MemoryBuffer> Content = readFile(...); // check IsInitialized again after locking in case there's a race std::scoped_lock<std::mutex> L(SharedStat.Mutex); if (OC->IsInitialized) return OC->Content->getBuffer(); OC->Content = std::move(Content); OC->IsInitialized = true; return OC->Content->getBuffer(); } Same pattern for minimized content cache. Since the locks are only held briefly there's no need to pass them around and lose clarity about how long it's open. Also, IIRC, `std::unique_lock` is more expensive than `std::scoped_lock` (but my memory could be faulty). dexonsmith:* You don't need the heavyweight std::map for reference validation. You can just use a…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions I didn't think of using `SpecificBumpPtrAllocator` this way, seems really neat, thanks for the suggestion! jansvoboda11: I didn't think of using `SpecificBumpPtrAllocator` this way, seems really neat, thanks for the…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Yeah, `scoped_lock` should be cheaper, I'll create a prep-patch for that. jansvoboda11: Yeah, `scoped_lock` should be cheaper, I'll create a prep-patch for that.
class DependencyScanningFilesystemLocalCache {		class DependencyScanningFilesystemLocalCache {
llvm::StringMap<const CachedFileSystemEntry *, llvm::BumpPtrAllocator> Cache;		llvm::StringMap<const CachedFileSystemEntry *, llvm::BumpPtrAllocator> Cache;

		dexonsmithUnsubmitted Done Reply Inline Actions I wonder if these should really be separate. Seems better to have something like: struct SharedContent { // Even better: unique_atomic_ptr<MemoryBuffer>, to enable lock-free access/updates. atomic<bool> HasOriginal; std::unique_ptr<MemoryBuffer> Original; // Even better: std::atomic<MinimizedContent >, with the latter bumpptrallocated, to // enable lock-free access/updates. atomic<bool> HasMinimized; SmallString<0> Minimized; // would be nice to bumpptrallocate this string... PreprocessorSkippedRangeMapping PPSkippedRangeMapping; }; SpecificBumpPtrAllocator<SharedCachedContent> ContentAlloc; DenseMap<llvm::sys::fs::UniqueID, SharedCachedContent > ContentCache; With that in place, seems like the `SharedStat` can have `std::atomic<SharedContent >`, which caches the result of the UID lookup. This way the UID `DenseMap` lookup is once per stat name, saving reducing contention on the per-shard lock. Then in the local cache, the only map storage would be: llvm::StringMap<SharedStat , llvm::BumpPtrAllocator> LocalStatCache; No need to duplicate the UID-keyed caches, since the lookups there would set the pointer for the SharedContent. dexonsmith: I wonder if these should really be separate. Seems better to have something like: ``` lang=c++…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions I really like idea of keeping a pointer to `SharedContent` in `SharedStat` and avoiding locking & lookup in the content caches. Merging original and minimized contents would probably simplify things quite a bit as well. jansvoboda11: I really like idea of keeping a pointer to `SharedContent` in `SharedStat` and avoiding locking…
public:		public:
const CachedFileSystemEntry *getCachedEntry(StringRef Filename) {		/// Returns entry associated with the filename or nullptr if none is found.
return Cache[Filename];		const CachedFileSystemEntry *findEntryByFilename(StringRef Filename) const {
		auto It = Cache.find(Filename);
		return It == Cache.end() ? nullptr : It->getValue();
		}

		/// Associates the given entry with the filename and returns the given entry
		/// pointer (for convenience).
		const CachedFileSystemEntry &
		insertEntryForFilename(StringRef Filename,
		const CachedFileSystemEntry &Entry) {
		const auto *InsertedEntry = Cache.insert({Filename, &Entry}).first->second;
		assert(InsertedEntry == &Entry && "entry already present");
		return *InsertedEntry;
}		}
};		};

/// Reference to a CachedFileSystemEntry.		/// Reference to a CachedFileSystemEntry.
/// If the underlying entry is an opened file, this wrapper returns the correct		/// If the underlying entry is an opened file, this wrapper returns the correct
/// contents (original or minimized) and ensures consistency with file size		/// contents (original or minimized) and ensures consistency with file size
/// reported by status.		/// reported by status.
class EntryRef {		class EntryRef {
/// For entry that is an opened file, this bit signifies whether its contents		/// For entry that is an opened file, this bit signifies whether its contents
/// are minimized.		/// are minimized.
bool Minimized;		bool Minimized;

		/// The filename used to access this entry.
		std::string Filename;

/// The underlying cached entry.		/// The underlying cached entry.
const CachedFileSystemEntry &Entry;		const CachedFileSystemEntry &Entry;

public:		public:
EntryRef(bool Minimized, const CachedFileSystemEntry &Entry)		EntryRef(bool Minimized, StringRef Name, const CachedFileSystemEntry &Entry)
: Minimized(Minimized), Entry(Entry) {}		: Minimized(Minimized), Filename(Name), Entry(Entry) {}

llvm::vfs::Status getStatus() const {		llvm::vfs::Status getStatus() const {
llvm::vfs::Status Stat = Entry.getStatus();		llvm::vfs::Status Stat = Entry.getStatus();
if (Stat.isDirectory())		if (!Stat.isDirectory())
return Stat;		Stat = llvm::vfs::Status::copyWithNewSize(Stat, getContents().size());
return llvm::vfs::Status::copyWithNewSize(Stat, getContents().size());		return llvm::vfs::Status::copyWithNewName(Stat, Filename);
		dexonsmithUnsubmitted Done Reply Inline Actions I think there's a new/unnecessary std::string copy in the case where `copyWithNewSize()` happens. If the size were fixed first that could be avoided: auto Stat = Entry.getStatus(); if (!Stat.isDirectory()) Stat = copyWithNewSize(...); return copyWithNewName(Stat, Filename); dexonsmith: I think there's a new/unnecessary std::string copy in the case where `copyWithNewSize()`…
}		}

bool isError() const { return Entry.isError(); }		bool isError() const { return Entry.isError(); }
bool isDirectory() const { return Entry.isDirectory(); }		bool isDirectory() const { return Entry.isDirectory(); }
StringRef getName() const { return Entry.getName(); }

/// If the cached entry represents an error, promotes it into `ErrorOr`.		/// If the cached entry represents an error, promotes it into `ErrorOr`.
llvm::ErrorOr<EntryRef> unwrapError() const {		llvm::ErrorOr<EntryRef> unwrapError() const {
if (isError())		if (isError())
return Entry.getError();		return Entry.getError();
return *this;		return *this;
}		}

Show All 33 Lines	public:
void disableMinimization(StringRef Filename);		void disableMinimization(StringRef Filename);
/// Enable minimization of all files.		/// Enable minimization of all files.
void enableMinimizationOfAllFiles() { NotToBeMinimized.clear(); }		void enableMinimizationOfAllFiles() { NotToBeMinimized.clear(); }

private:		private:
/// Check whether the file should be minimized.		/// Check whether the file should be minimized.
bool shouldMinimize(StringRef Filename);		bool shouldMinimize(StringRef Filename);

		/// Returns entry for the given filename.
		///
		/// Attempts to use the local and shared caches first, then falls back to
		/// using the underlying filesystem.
llvm::ErrorOr<EntryRef> getOrCreateFileSystemEntry(StringRef Filename);		llvm::ErrorOr<EntryRef> getOrCreateFileSystemEntry(StringRef Filename);

		/// For a filename that's not yet associated with any entry in the caches,
		/// uses the underlying filesystem to either look up the entry based in the
		/// shared cache indexed by unique ID, or creates new entry from scratch.
		llvm::ErrorOr<const CachedFileSystemEntry &>
		computeAndStoreResult(StringRef Filename);

		/// Minimizes the given entry if necessary and returns a wrapper object with
		/// reference semantics.
		EntryRef minimizeIfNecessary(const CachedFileSystemEntry &Entry,
		StringRef Filename);

		/// Represents a filesystem entry that has been stat-ed (and potentially read)
		/// and that's about to be inserted into the cache as `CachedFileSystemEntry`.
		struct TentativeEntry {
		llvm::vfs::Status Status;
		std::unique_ptr<llvm::MemoryBuffer> Contents;

		dexonsmithUnsubmitted Not Done Reply Inline Actions I'd use `const&` to avoid copying the string on the way in... see below. dexonsmith: I'd use `const&` to avoid copying the string on the way in... see below.
		TentativeEntry(llvm::vfs::Status Status,
		std::unique_ptr<llvm::MemoryBuffer> Contents = nullptr)
		dexonsmithUnsubmitted Not Done Reply Inline Actions I think, rather than move/copy the status name, the name should be wiped out to ensure no one relies on it. Every access should use `copyWithNewName()` since this is shared across all things that point to the same UID... so let's use `copyWithNewName()` here to drop the ignored name. dexonsmith: I think, rather than move/copy the status name, the name should be wiped out to ensure no one…
		: Status(std::move(Status)), Contents(std::move(Contents)) {}
		};

		/// Reads file at the given path. Enforces consistency between the file size
		/// in status and size of read contents.
		llvm::ErrorOr<TentativeEntry> readFile(StringRef Filename);

		/// Returns entry associated with the unique ID of the given tentative entry
		/// if there is some in the shared cache. Otherwise, constructs new one,
		/// associates it with the unique ID and returns the result.
		const CachedFileSystemEntry &
		getOrEmplaceSharedEntryForUID(TentativeEntry TEntry);

		/// Returns entry associated with the filename or nullptr if none is found.
		///
		/// Returns entry from local cache if there is some. Otherwise, if the entry
		/// is found in the shared cache, writes it through the local cache and
		/// returns it. Otherwise returns nullptr.
		const CachedFileSystemEntry *
		findEntryByFilenameWithWriteThrough(StringRef Filename);

		/// Returns entry associated with the unique ID in the shared cache or nullptr
		/// if none is found.
		const CachedFileSystemEntry *
		findSharedEntryByUID(llvm::vfs::Status Stat) const {
		return SharedCache.getShardForUID(Stat.getUniqueID())
		dexonsmithUnsubmitted Not Done Reply Inline Actions This doesn't look right to me. UIDs should be sharded independently of the filename they happen to have been reached by; otherwise each filename shard is developing its own idea of what each UID means. Since UID distribution is not uniform, probably the UID shard should be chosen by `hash_value(Stat.getUniqueID()) % NumShards`. You could use the same sets of shards for UIDMap and FilenameMap, but since they're independent I'd probably do: UIDCache: sharded by UID: UIDMap and BumpPtrAllocator for entries (and likely anything else tied to content) FilenameCache: sharded by filename: FilenameMap (and perhaps other things tied to filename?) dexonsmith: This doesn't look right to me. UIDs should be sharded independently of the filename they happen…
		jansvoboda11AuthorUnsubmitted Done Reply Inline Actions Hmm, skimming through `RealFileSystem::status`, I saw that it's calling `sys::fs::status` with "follow symlinks" enabled. It made sense to me that the name stored in `llvm::vfs::Status` would match that and refer the fully resolved target entry, not the symlink itself. Seeing as this is not the case, I agree the UID itself should be used for choosing the shard. jansvoboda11: Hmm, skimming through `RealFileSystem::status`, I saw that it's calling `sys::fs::status` with…
		.findEntryByUID(Stat.getUniqueID());
		}

		/// Associates the given entry with the filename in the local cache and
		/// returns it.
		const CachedFileSystemEntry &
		insertLocalEntryForFilename(StringRef Filename,
		const CachedFileSystemEntry &Entry) {
		return LocalCache.insertEntryForFilename(Filename, Entry);
		}

		/// Returns entry associated with the filename in the shared cache if there is
		/// some. Otherwise, constructs new one with the given error code, associates
		/// it with the filename and returns the result.
		const CachedFileSystemEntry &
		getOrEmplaceSharedEntryForFilename(StringRef Filename, std::error_code EC) {
		return SharedCache.getShardForFilename(Filename)
		.getOrEmplaceEntryForFilename(Filename, EC);
		}

		/// Returns entry associated with the filename in the shared cache if there is
		/// some. Otherwise, associates the given entry with the filename and returns
		/// it.
		const CachedFileSystemEntry &
		getOrInsertSharedEntryForFilename(StringRef Filename,
		const CachedFileSystemEntry &Entry) {
		return SharedCache.getShardForFilename(Filename)
		.getOrInsertEntryForFilename(Filename, Entry);
		}

/// The global cache shared between worker threads.		/// The global cache shared between worker threads.
DependencyScanningFilesystemSharedCache &SharedCache;		DependencyScanningFilesystemSharedCache &SharedCache;
/// The local cache is used by the worker thread to cache file system queries		/// The local cache is used by the worker thread to cache file system queries
/// locally instead of querying the global cache every time.		/// locally instead of querying the global cache every time.
DependencyScanningFilesystemLocalCache LocalCache;		DependencyScanningFilesystemLocalCache LocalCache;
/// The optional mapping structure which records information about the		/// The optional mapping structure which records information about the
/// excluded conditional directive skip mappings that are used by the		/// excluded conditional directive skip mappings that are used by the
/// currently active preprocessor.		/// currently active preprocessor.
Show All 10 Lines

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp

Show All 10 Lines

#include "llvm/Support/MemoryBuffer.h" #include "llvm/Support/MemoryBuffer.h"

#include "llvm/Support/SmallVectorMemoryBuffer.h" #include "llvm/Support/SmallVectorMemoryBuffer.h"

#include "llvm/Support/Threading.h" #include "llvm/Support/Threading.h"

using namespace clang; using namespace clang;

using namespace tooling; using namespace tooling;

using namespace dependencies; using namespace dependencies;

llvm::ErrorOr<llvm::vfs::Status> llvm::ErrorOr<DependencyScanningWorkerFilesystem::TentativeEntry>

CachedFileSystemEntry::initFile(StringRef Filename, llvm::vfs::FileSystem &FS) { DependencyScanningWorkerFilesystem::readFile(StringRef Filename) {

// Load the file and its content from the file system. // Load the file and its content from the file system.

auto MaybeFile = FS.openFileForRead(Filename); auto MaybeFile = getUnderlyingFS().openFileForRead(Filename);

if (!MaybeFile) if (!MaybeFile)

return MaybeFile.getError(); return MaybeFile.getError();

dexonsmithUnsubmitted

Not Done

In what circumstances should this return a cached-error TentativeEntry? Any?

dexonsmith: In what circumstances should this return a cached-error TentativeEntry? Any?

jansvoboda11AuthorUnsubmitted

Done

I don't think the distinction matters at this level. Whether failures should be cached is a decision that's being made one level up.

I personally prefer having TentativeEntry to be "non-fallible" and explicitly wrapping the whole thing in ErrorOr. That makes it easier for others to know what they are working with (i.e. this object cannot represent an error state). Eventually, I think this would make sense for the caches and CachedFileSystemEntry too.

jansvoboda11: I don't think the distinction matters at this level. Whether failures should be cached is a…

auto File = std::move(*MaybeFile); auto File = std::move(*MaybeFile);

auto MaybeStat = File->status(); auto MaybeStat = File->status();

if (!MaybeStat) if (!MaybeStat)

return MaybeStat.getError(); return MaybeStat.getError();

dexonsmithUnsubmitted

Not Done

Since the file was opened, should we return cached-error TentativeEntry here, rather than an error?

dexonsmith: Since the file was opened, should we return cached-error TentativeEntry here, rather than an…

jansvoboda11AuthorUnsubmitted

Done

Let's return an error here and create error CachedFileSystemEntry one level up.

jansvoboda11: Let's return an error here and create error `CachedFileSystemEntry` one level up.

auto Stat = std::move(*MaybeStat); auto Stat = std::move(*MaybeStat);

auto MaybeBuffer = File->getBuffer(Stat.getName()); auto MaybeBuffer = File->getBuffer(Stat.getName());

dexonsmithUnsubmitted

Not Done

Is there a potential (already existing) race condition here? Can't the file change between the stat and opening the buffer?

Seems like either:

The Stat should be updated to have the observed size of the buffer.
An error should be returned if the size doesn't match.
The stat and/or read should be retried until they match.

dexonsmith: Is there a potential (already existing) race condition here? Can't the file change between the…

jansvoboda11AuthorUnsubmitted

Done

I think that's right. I left a detailed FIXME in the code calling read() and would like to tackle that in a follow up. Would that be fine?

jansvoboda11: I think that's right. I left a detailed FIXME in the code calling `read()` and would like to…

dexonsmithUnsubmitted

Not Done

Yup, doing it in a separate commit makes sense. I suggest taking the first option, since it's the simplest.

dexonsmith: Yup, doing it in a separate commit makes sense. I suggest taking the first option, since it's…

if (!MaybeBuffer) if (!MaybeBuffer)

return MaybeBuffer.getError(); return MaybeBuffer.getError();

dexonsmithUnsubmitted

Not Done

After a successful stat on the same file descriptor, it definitely feels like this is an error that should be cached, and a TentativeEntry that is in an error state should be returned.

dexonsmith: After a successful stat on the same file descriptor, it definitely feels like this is an error…

auto Buffer = std::move(*MaybeBuffer); auto Buffer = std::move(*MaybeBuffer);

OriginalContents = std::move(Buffer); // If the file size changed between read and stat, pretend it didn't.

return Stat; if (Stat.getSize() != Buffer->getBufferSize())

Stat = llvm::vfs::Status::copyWithNewSize(Stat, Buffer->getBufferSize());

return TentativeEntry(Stat, std::move(Buffer));

} }

void CachedFileSystemEntry::minimizeFile() { EntryRef DependencyScanningWorkerFilesystem::minimizeIfNecessary(

assert(OriginalContents && "minimizing missing contents"); const CachedFileSystemEntry &Entry, StringRef Filename) {

if (Entry.isError() || Entry.isDirectory() || !shouldMinimize(Filename))

return EntryRef(/*Minimized=*/false, Filename, Entry);

CachedFileContents *Contents = Entry.getContents();

assert(Contents && "contents not initialized");

// Double-checked locking.

if (Contents->MinimizedAccess.load())

return EntryRef(/*Minimized=*/true, Filename, Entry);

std::lock_guard<std::mutex> GuardLock(Contents->ValueLock);

// Double-checked locking.

if (Contents->MinimizedAccess.load())

return EntryRef(/*Minimized=*/true, Filename, Entry);

llvm::SmallString<1024> MinimizedFileContents; llvm::SmallString<1024> MinimizedFileContents;

// Minimize the file down to directives that might affect the dependencies. // Minimize the file down to directives that might affect the dependencies.

SmallVector<minimize_source_to_dependency_directives::Token, 64> Tokens; SmallVector<minimize_source_to_dependency_directives::Token, 64> Tokens;

if (minimizeSourceToDependencyDirectives(OriginalContents->getBuffer(), if (minimizeSourceToDependencyDirectives(Contents->Original->getBuffer(),

dexonsmithUnsubmitted

Done

This will introduce a memory regression in the common case where there are no PCHs.

Previously, only minimized files were saved in memory. These are relatively small, so probably no big deal.
Now, the original file is being saved as well. These are not small.

Instead, the MemoryBuffer should be saved directly.

For large files whose size isn't on a page boundary, this will be an mmap. This doesn't count against process memory because the kernel can optimize this easily, such as by sharing between processes (e.g., with actual compilation).
For large files on page boundaries, there was already a memcpy done in order to make this null-terminated. No reason to do that again here.
For small files, this is already a buffer on the heap... the extra memcpy and allocation probably doesn't matter all that much, but the large file case is worth optimizing for.

This wasteful was already around when files weren't being minimized files, but it's going to use a lot more memory now that original files are stored even when they're going to be minimized.

dexonsmith: This will introduce a memory regression in the common case where there are no PCHs.

MinimizedFileContents, Tokens)) { MinimizedFileContents, Tokens)) {

// FIXME: Propagate the diagnostic if desired by the client. // FIXME: Propagate the diagnostic if desired by the client.

// Use the original file if the minimization failed. // Use the original file if the minimization failed.

MinimizedContentsStorage = Contents->MinimizedStorage =

llvm::MemoryBuffer::getMemBuffer(*OriginalContents); llvm::MemoryBuffer::getMemBuffer(*Contents->Original);

MinimizedContentsAccess.store(MinimizedContentsStorage.get()); Contents->MinimizedAccess.store(Contents->MinimizedStorage.get());

return; return EntryRef(/*Minimized=*/true, Filename, Entry);

} }

// The contents produced by the minimizer must be null terminated. // The contents produced by the minimizer must be null terminated.

assert(MinimizedFileContents.data()[MinimizedFileContents.size()] == '\0' && assert(MinimizedFileContents.data()[MinimizedFileContents.size()] == '\0' &&

"not null terminated contents"); "not null terminated contents");

// Compute the skipped PP ranges that speedup skipping over inactive // Compute the skipped PP ranges that speedup skipping over inactive

// preprocessor blocks. // preprocessor blocks.

llvm::SmallVector<minimize_source_to_dependency_directives::SkippedRange, 32> llvm::SmallVector<minimize_source_to_dependency_directives::SkippedRange, 32>

SkippedRanges; SkippedRanges;

minimize_source_to_dependency_directives::computeSkippedRanges(Tokens, minimize_source_to_dependency_directives::computeSkippedRanges(Tokens,

SkippedRanges); SkippedRanges);

PreprocessorSkippedRangeMapping Mapping; PreprocessorSkippedRangeMapping Mapping;

for (const auto &Range : SkippedRanges) { for (const auto &Range : SkippedRanges) {

if (Range.Length < 16) { if (Range.Length < 16) {

// Ignore small ranges as non-profitable. // Ignore small ranges as non-profitable.

// FIXME: This is a heuristic, its worth investigating the tradeoffs // FIXME: This is a heuristic, its worth investigating the tradeoffs

// when it should be applied. // when it should be applied.

continue; continue;

} }

Mapping[Range.Offset] = Range.Length; Mapping[Range.Offset] = Range.Length;

} }

PPSkippedRangeMapping = std::move(Mapping); Contents->PPSkippedRangeMapping = std::move(Mapping);

MinimizedContentsStorage = std::make_unique<llvm::SmallVectorMemoryBuffer>( Contents->MinimizedStorage = std::make_unique<llvm::SmallVectorMemoryBuffer>(

std::move(MinimizedFileContents)); std::move(MinimizedFileContents));

// The algorithm in `getOrCreateFileSystemEntry` uses the presence of // This function performed double-checked locking using `MinimizedAccess`.

// minimized contents to decide whether an entry is up-to-date or not. // Assigning it must be the last thing this function does. If we were to

// If it is up-to-date, the skipped range mappings must be already computed. // assign it before `PPSkippedRangeMapping`, other threads may skip the

// This is why we need to store the minimized contents **after** storing the // critical section (`MinimizedAccess != nullptr`) and access the mappings

// skipped range mappings. Failing to do so would lead to a data race. // that are about to be initialized, leading to a data race.

MinimizedContentsAccess.store(MinimizedContentsStorage.get()); Contents->MinimizedAccess.store(Contents->MinimizedStorage.get());

return EntryRef(/*Minimized=*/true, Filename, Entry);

} }

DependencyScanningFilesystemSharedCache:: DependencyScanningFilesystemSharedCache::

DependencyScanningFilesystemSharedCache() { DependencyScanningFilesystemSharedCache() {

// This heuristic was chosen using a empirical testing on a // This heuristic was chosen using a empirical testing on a

// reasonably high core machine (iMacPro 18 cores / 36 threads). The cache // reasonably high core machine (iMacPro 18 cores / 36 threads). The cache

// sharding gives a performance edge by reducing the lock contention. // sharding gives a performance edge by reducing the lock contention.

// FIXME: A better heuristic might also consider the OS to account for // FIXME: A better heuristic might also consider the OS to account for

// the different cost of lock contention on different OSes. // the different cost of lock contention on different OSes.

NumShards = NumShards =

std::max(2u, llvm::hardware_concurrency().compute_thread_count() / 4); std::max(2u, llvm::hardware_concurrency().compute_thread_count() / 4);

CacheShards = std::make_unique<CacheShard[]>(NumShards); CacheShards = std::make_unique<CacheShard[]>(NumShards);

} }

DependencyScanningFilesystemSharedCache::SharedFileSystemEntry & DependencyScanningFilesystemSharedCache::CacheShard &

DependencyScanningFilesystemSharedCache::get(StringRef Key) { DependencyScanningFilesystemSharedCache::getShardForFilename(

CacheShard &Shard = CacheShards[llvm::hash_value(Key) % NumShards]; StringRef Filename) const {

std::lock_guard<std::mutex> LockGuard(Shard.CacheLock); return CacheShards[llvm::hash_value(Filename) % NumShards];

auto It = Shard.Cache.try_emplace(Key); }

dexonsmithUnsubmitted

Not Done

This name is a bit misleading... looks more like getOrCreateFileContents() to me.

dexonsmith: This name is a bit misleading... looks more like `getOrCreateFileContents()` to me.

return It.first->getValue();

DependencyScanningFilesystemSharedCache::CacheShard &

DependencyScanningFilesystemSharedCache::getShardForUID(

llvm::sys::fs::UniqueID UID) const {

auto Hash = llvm::hash_combine(UID.getDevice(), UID.getFile());

return CacheShards[Hash % NumShards];

}

const CachedFileSystemEntry *

DependencyScanningFilesystemSharedCache::CacheShard::findEntryByFilename(

StringRef Filename) const {

std::lock_guard<std::mutex> LockGuard(CacheLock);

auto It = EntriesByFilename.find(Filename);

return It == EntriesByFilename.end() ? nullptr : It->getValue();

}

const CachedFileSystemEntry *

DependencyScanningFilesystemSharedCache::CacheShard::findEntryByUID(

llvm::sys::fs::UniqueID UID) const {

std::lock_guard<std::mutex> LockGuard(CacheLock);

auto It = EntriesByUID.find(UID);

return It == EntriesByUID.end() ? nullptr : It->getSecond();

}

const CachedFileSystemEntry &

DependencyScanningFilesystemSharedCache::CacheShard::

getOrEmplaceEntryForFilename(StringRef Filename,

llvm::ErrorOr<llvm::vfs::Status> Stat) {

std::lock_guard<std::mutex> LockGuard(CacheLock);

auto Insertion = EntriesByFilename.insert({Filename, nullptr});

if (Insertion.second)

Insertion.first->second =

new (EntryStorage.Allocate()) CachedFileSystemEntry(std::move(Stat));

return *Insertion.first->second;

}

const CachedFileSystemEntry &

DependencyScanningFilesystemSharedCache::CacheShard::getOrEmplaceEntryForUID(

llvm::sys::fs::UniqueID UID, llvm::vfs::Status Stat,

std::unique_ptr<llvm::MemoryBuffer> Contents) {

std::lock_guard<std::mutex> LockGuard(CacheLock);

auto Insertion = EntriesByUID.insert({UID, nullptr});

if (Insertion.second) {

CachedFileContents *StoredContents = nullptr;

if (Contents)

StoredContents = new (ContentsStorage.Allocate())

CachedFileContents(std::move(Contents));

dexonsmithUnsubmitted

Done

return *Insertion.first->second;

}

const CachedFileSystemEntry &

DependencyScanningFilesystemSharedCache::CacheShard::getOrEmplaceEntryForUID(

llvm::sys::fs::UniqueID UID, llvm::vfs::Status Stat,

- CachedFileContents *Contents) {

+ std::unique_ptr<MemoryBuffer> OriginalContents) {

std::lock_guard<std::mutex> LockGuard(CacheLock);

auto Insertion = EntriesByUID.insert({UID, nullptr});

- if (Insertion.second)

+ if (Insertion.second) {

+ CachedFileContents *Contents = OriginalContents

+ ? new (ContentsStorage.Allocate())

+ CachedFileContents(std::move(OriginalContents));

+ : nullptr;

Insertion.first->second = new (EntryStorage.Allocate())

CachedFileSystemEntry(std::move(Stat), Contents);

+ }

return *Insertion.first->second;

}

const CachedFileSystemEntry &

Looks like the two UID maps are always filled directly after each other. Seems like we can reduce lookups like this.

dexonsmith: Looks like the two UID maps are always filled directly after each other. Seems like we can…

Insertion.first->second = new (EntryStorage.Allocate())

CachedFileSystemEntry(std::move(Stat), StoredContents);

}

return *Insertion.first->second;

}

const CachedFileSystemEntry &

DependencyScanningFilesystemSharedCache::CacheShard::

getOrInsertEntryForFilename(StringRef Filename,

const CachedFileSystemEntry &Entry) {

std::lock_guard<std::mutex> LockGuard(CacheLock);

return *EntriesByFilename.insert({Filename, &Entry}).first->getValue();

} }

dexonsmithUnsubmitted

Done

I think this can be merged into getOrEmplaceEntryForUID.

dexonsmith: I think this can be merged into getOrEmplaceEntryForUID.

/// Whitelist file extensions that should be minimized, treating no extension as /// Whitelist file extensions that should be minimized, treating no extension as

/// a source file that should be minimized. /// a source file that should be minimized.

/// ///

/// This is kinda hacky, it would be better if we knew what kind of file Clang /// This is kinda hacky, it would be better if we knew what kind of file Clang

/// was expecting instead. /// was expecting instead.

static bool shouldMinimizeBasedOnExtension(StringRef Filename) { static bool shouldMinimizeBasedOnExtension(StringRef Filename) {

StringRef Ext = llvm::sys::path::extension(Filename); StringRef Ext = llvm::sys::path::extension(Filename);

Show All 27 Lines bool DependencyScanningWorkerFilesystem::shouldMinimize(StringRef RawFilename) {

if (!shouldMinimizeBasedOnExtension(RawFilename)) if (!shouldMinimizeBasedOnExtension(RawFilename))

return false; return false;

llvm::SmallString<256> Filename; llvm::SmallString<256> Filename;

llvm::sys::path::native(RawFilename, Filename); llvm::sys::path::native(RawFilename, Filename);

return !NotToBeMinimized.contains(Filename); return !NotToBeMinimized.contains(Filename);

} }

void CachedFileSystemEntry::init(llvm::ErrorOr<llvm::vfs::Status> &&MaybeStatus, const CachedFileSystemEntry &

StringRef Filename, DependencyScanningWorkerFilesystem::getOrEmplaceSharedEntryForUID(

llvm::vfs::FileSystem &FS) { TentativeEntry TEntry) {

if (!MaybeStatus || MaybeStatus->isDirectory()) auto &Shard = SharedCache.getShardForUID(TEntry.Status.getUniqueID());

MaybeStat = std::move(MaybeStatus); return Shard.getOrEmplaceEntryForUID(TEntry.Status.getUniqueID(),

else std::move(TEntry.Status),

MaybeStat = initFile(Filename, FS); std::move(TEntry.Contents));

}

dexonsmithUnsubmitted

Done

return !NotToBeMinimized.contains(Filename);

}

const CachedFileSystemEntry &

DependencyScanningWorkerFilesystem::getOrEmplaceSharedEntryForUID(

TentativeEntry TEntry) {

auto &Shard = SharedCache.getShardForUID(TEntry.Status.getUniqueID());

- CachedFileContents *Contents = nullptr;

- if (TEntry.Contents)

- Contents = &Shard.getOrEmplaceContentsForUID(TEntry.Status.getUniqueID(),

- std::move(TEntry.Contents));

return Shard.getOrEmplaceEntryForUID(TEntry.Status.getUniqueID(),

- std::move(TEntry.Status), Contents);

+ std::move(TEntry.Status),

+ std::move(TEntry.Contents));

}

const CachedFileSystemEntry *

I think this can be one call since they're taking the same lock and always done one after the other.

dexonsmith: I think this can be one call since they're taking the same lock and always done one after the…

const CachedFileSystemEntry *

DependencyScanningWorkerFilesystem::findEntryByFilenameWithWriteThrough(

StringRef Filename) {

if (const auto *Entry = LocalCache.findEntryByFilename(Filename))

return Entry;

auto &Shard = SharedCache.getShardForFilename(Filename);

if (const auto *Entry = Shard.findEntryByFilename(Filename))

return &LocalCache.insertEntryForFilename(Filename, *Entry);

return nullptr;

}

llvm::ErrorOr<const CachedFileSystemEntry &>

dexonsmithUnsubmitted

Done

I don't love the lack of clarity between when the lock is taken and when it's released caused by this being an out parameter. I don't have a specific suggestion, but maybe there's another way to factor the code overall?

dexonsmith: I don't love the lack of clarity between when the lock is taken and when it's released caused…

jansvoboda11AuthorUnsubmitted

Done

Fair point, I'll try to simplify this.

jansvoboda11: Fair point, I'll try to simplify this.

DependencyScanningWorkerFilesystem::computeAndStoreResult(StringRef Filename) {

llvm::ErrorOr<llvm::vfs::Status> Stat = getUnderlyingFS().status(Filename);

if (!Stat) {

dexonsmithUnsubmitted

Not Done

Calling readFile() behind a lock doesn't seem great. I did confirm that the original code seems to do the same thing (lock outside of createFilesystemEntry), but this refactor seems to bake the pattern into a few more places.

When races aren't very likely it's usually cheaper to:

lock to check cache, returning cached result if so
without a lock, compute result
lock to set cache, but if the cache has been filled in the meantime by another thread, return that and throw out the just-computed one

Maybe it'd be useful to add:

std::atomic<bool> IsInitialized;

to the MinimizedContents and OriginalContents structures stored in the shared cache. This could make it easier to decouple insertion in the shared cache from initialization. I.e., it'd be safe to release the lock while doing work; another thread won't think the default-constructed contents are correct.

dexonsmith: Calling `readFile()` behind a lock doesn't seem great. I did confirm that the original code…

jansvoboda11AuthorUnsubmitted

Done

Could you expand on this a bit more? If we have a lock for each file, how is locking, reading, unlocking slower than locking, unlocking, reading, locking, unlocking?

jansvoboda11: Could you expand on this a bit more? If we have a lock for each file, how is locking, reading…

dexonsmithUnsubmitted

Not Done

You're right; if there's a lock per-file and all consumers want the result of all computations there's no benefit to releasing the lock quickly.

If some consumers only want partial results (or already-computed results), can be faster to release quickly.
Could be expensive to have mutexes per-file, since that's A LOT of mutexes. It might be cheaper in aggregate to switch to lock-free here.

dexonsmith: You're right; if there's a lock per-file and all consumers want the result of all computations…

if (!shouldCacheStatFailures(Filename))

return Stat.getError();

const auto &Entry =

getOrEmplaceSharedEntryForFilename(Filename, Stat.getError());

return insertLocalEntryForFilename(Filename, Entry);

}

if (const auto *Entry = findSharedEntryByUID(*Stat))

return insertLocalEntryForFilename(Filename, *Entry);

auto TEntry =

Stat->isDirectory() ? TentativeEntry(*Stat) : readFile(Filename);

const CachedFileSystemEntry *SharedEntry = [&]() {

if (TEntry) {

const auto &UIDEntry = getOrEmplaceSharedEntryForUID(std::move(*TEntry));

dexonsmithUnsubmitted

Not Done

I'm not quite following this logic. I think it's safe (and important!) to modify MaybeStat if read() fails.

We're in a critical section that either creates and partially initializes the entry, or incrementally updates it.

In the "creates and partially initializes" case:

All other workers will get nullptr for Cache.getEntry() and try to enter the critical section.
We have just seen a successful MaybeStat value.
needsRead() will be true since we have not read contents before. We will immediately try to read.
read() should open the original contents and can safely:
- on success, update the value for MaybeStat to match the observed size
- on failure, drop the value and set the error for MaybeStat to the observed error
When we leave the critical section, either:
- MaybeStat stores an error; no thread will enter the critical section again
- OriginalContents are initialized and needsRead() returns false

In the "incrementally updates" case:

needsRead() returns false so read() will not be called

dexonsmith: I'm not quite following this logic. I think it's safe (and important!) to modify MaybeStat if…

dexonsmithUnsubmitted

Not Done

I think to avoid this race you need to delay creating "UID to content" map entry until there is the result of a successful read() to store.

I'll describe an algorithm that I think is fairly clean that handles this. I'm using different data structure names to avoid confusion since I've broken it down a little differently:

ReadResult: stat (for directories) OR error and uid (failed read) OR stat and content (and optional minimized content and pp ranges, and a way to update them atomically))
FilenameMap: map from Filename to ReadResult* (shared and sharded; mirrored locally in each worker)
UIDMap: map from UID to ReadResult* (shared and sharded; probably no local mirror)

And here's the algorithm:

// Top-level API: get the entry/result for some filename.
ErrorOr<ReadResult &> getOrCreateResult(StringRef Filename) {
  if (ReadResult *Result = lookupEntryForFilename(Filename))
    return minimizeIfNecessary(*Result, ShouldMinimize);
  if (ErrorOr<ReadResult &> Result = computeAndStoreResult(Filename))
    return minimizeIfNecessary(*Result, ShouldMinimize);
  else
    return Result.getError();
}
// Compute and store an entry/result for some filename. Returned result
// has in-sync stat+read info (assuming read was successful).
ErrorOr<ReadResult &> computeAndStoreResult(StringRef Filename) {
  ErrorOr<Status> Stat = UnderlyingFS->status(Filename);
  if (!Stat)
    return Stat.getError(); // Can't cache missing files.
  if (ReadResult *Result = lookupEntryForUID(Stat->UID))
    return storeFilenameEntry(Filename, *Result); // UID already known.
  // UID not known. Compute a ReadResult.
  //
  // Unless this is a directory (where we don't need to go back to the FS),
  // ignore existing 'Stat' because without an open file descriptor the UID
  // could change.
  Optional<ReadResult> Result;
  if (Stat->isDirectory())
    Result = ReadResult(*Stat);
  else if (ErrorOr<ReadResult> MaybeResult = computeReadResult(Filename))
    Result = std::move(*MaybeResult);
  else
    return MaybeResult.getError(); // File disappeared...
  // Store the result. Cascade through UID then Filename. Each level could
  // return a different result than it was passed in.
  return storeEntryForFilenameOrReturnExisting(Filename,
             storeEntryForUIDOrReturnExisting(std::move(*Result));
}
// Lookup existing result in FilenameMap. No mutation. First checks local map
// then falls back to the shared map (locks shard, lookup, unlocks, saves in
// local map, returns).
ReadResult *lookupEntryForFilename(StringRef Filename);
// Lookup existing result in UIDMap. No mutation. No local map, just a shared
// map (lockshard+lookup+return).
ReadResult *lookupEntryForUID(UniqueID);
// Compute read result using a single file descriptor.
// - Return error if `open()` fails. Can't cache missing files.
// - Else compute ReadResult: Stat open file descriptor and get a memory buffer from it.
// Note: "Error" state if stat fails.
// Note: "Error" state if stat succeeds and memory buffer does not open.
// Note: if the memory buffer opens successfully, status updated with observed size.
// Note: does not take a UID parameter since live FS could have changed.
// Note: does not access or mutate UIDMap/FilenameMap/etc.
ErrorOr<ReadResult> computeReadResult(StringRef Filename);
// Compare-exchange. Pulls UID out of NewResult. Locks shard for UIDMap[UID]; checks for
// existing result; if none, bump-ptr-allocates and stores NewResult; returns stored
// result.
ReadResult& storeEntryForUIDOrReturnExisting(ReadResult &&NewResult);
// Compare-exchange. Locks shard for FilenameMap[Filename]; checks for existing result;
// if none, stores parameter; unlocks; updates local map with stored result and returns
// it.
ReadResult& storeEntryForFilenameOrReturnExisting(StringRef Filename, ReadResult &);
// If needed and missing, adds minimization info atomically. Note that Result
// may store a cached read error, or a directory.
ReadResult& minimizeIfNecessary(ReadResult& Result, bool ShouldMinimize);

dexonsmith: The key property being the last bullet of the first case: that the "create and partially…

jansvoboda11AuthorUnsubmitted

Done

This patch (and D115346) were motivated by D114971, which prevents minimization of symlinks that point to files referenced by precompiled dependencies (e.g. a PCH).

When the dependency scanning worker disables minimization of a file referenced by a precompiled dependency, my idea was to immediately "canonicalize" the filename to UniqueIDs (through stat) and use that when deciding whether to minimize its contents in getOrCreateFileSystemEntry. With that approach, the "filename -> UniqueID" map/cache acts as the authority for stat information. However, I can see how this breaks when the FS is volatile. Besides the issue outlined in your last comment, the current approach in D114971 prolongs the pause between initial stat (when disabling file minimization - "configure time") and read ("query time"), increasing the chances for observing filesystem volatility.

I think your approach makes a lot of sense if we want to be really defensive against volatile FS. Making sure we don't have to re-stat or read files that were already stat-ed during "configuration" or previous "query" would be nice. I think that unfortunately means we need to actually read all input files of precompiled dependencies in D114971. Just stat-ing such files is no longer an option, since that would get us back to square one if we need to later read them (they might've changed).

I have implemented your idea locally, and will update this patch tomorrow.

jansvoboda11: This patch (and D115346) were motivated by D114971, which prevents minimization of symlinks…

return &getOrInsertSharedEntryForFilename(Filename, UIDEntry);

dexonsmithUnsubmitted

Not Done

Seems like the existing stat value should be passed into read() and the second stat there removed.

dexonsmith: Seems like the existing stat value should be passed into `read()` and the second stat there…

}

return &getOrEmplaceSharedEntryForFilename(Filename, TEntry.getError());

}();

jansvoboda11AuthorUnsubmitted

Done

I'm not sure these should be separate. We could end up in situation where the Filename map contains different entry than the UID map for the same directory entry. I'm tempted to merge these functions into one and perform the updates in a single critical section...

jansvoboda11: I'm not sure these should be separate. We could end up in situation where the Filename map…

dexonsmithUnsubmitted

Not Done

I'm not sure these should be separate. We could end up in situation where the Filename map contains different entry than the UID map for the same directory entry.

I'm also sure precisely what you mean by "for the same directory entry" in this context; and I don't see what's wrong with the situation I think you're outlining.

I'm tempted to merge these functions into one and perform the updates in a single critical section...

A single critical section for setting UID and filename at the same time would be hard to get right (and efficient), since UIDs have aliases through other filenames due to different directory paths (dir/../x.h vs x.h) and filesystem links (hard and symbolic).

Here's the race that I think(?) you're worried about:

Worker1 does a tentative stat of "x.h", finds a UID that isn't mapped (UIDX1, but it's ignored...).
Worker2 does a tentative stat of "x.h", finds a UID that isn't mapped (UIDX1, but it's ignored...).
Worker1 opens "x.h", finds ContentX1+StatX1 (with UIDX1), saves mapping UIDX1 -> ContentX1+StatX1.
"x.h" changes.
Worker2 opens "x.h", finds ContentX2+StatX2 (with UIDX2), saves mapping UIDX2 -> ContentX2+StatX2.
Worker2 saves mapping "x.h" -> ContentX2+StatX2.
Both workers move forward with "x.h" -> ContentX2+StatX2.

IIUC, you're concerned that the mapping UIDX1 -> ContentX1+StatX1 was saved. The side effect is that if a future tentative stat of (e.g.) "y.h" returns UIDX1, then "y.h" will be mapped to ContentX1+StatX1. Is this what concerns you? Why? (Is it something else?)

The concern I have is that some filesystems recycle UIDs (maybe "x.h" *was* a symbolic link to "y.h" and then became its own file... or maybe "x.h" and "y.h" were hard links... or maybe "y.h" is just a new file!). But that's a problem with using UIDs to detect equivalent filesystem links / content in general. I don't see any reason to be more concerned here than elsewhere, and to avoid depending on UID we'd need a pretty different design (e.g., lazily detect and model directory structure and symbolic links).

dexonsmith: > I'm not sure these should be separate. We could end up in situation where the Filename map…

jansvoboda11AuthorUnsubmitted

Done

Yes, that's the kind of scenario I was thinking about. I'm not concerned about consequences of that side effect, I just don't like storing garbage that will most likely never be used/referenced again and might be confusing during debugging.

I agree with you on UID recycling...

jansvoboda11: Yes, that's the kind of scenario I was thinking about. I'm not concerned about consequences of…

return insertLocalEntryForFilename(Filename, *SharedEntry);

} }

llvm::ErrorOr<EntryRef> llvm::ErrorOr<EntryRef>

DependencyScanningWorkerFilesystem::getOrCreateFileSystemEntry( DependencyScanningWorkerFilesystem::getOrCreateFileSystemEntry(

StringRef Filename) { StringRef Filename) {

bool ShouldBeMinimized = shouldMinimize(Filename); if (const auto *Entry = findEntryByFilenameWithWriteThrough(Filename))

return minimizeIfNecessary(*Entry, Filename).unwrapError();

const auto *Entry = LocalCache.getCachedEntry(Filename); auto MaybeEntry = computeAndStoreResult(Filename);

if (Entry && !Entry->needsUpdate(ShouldBeMinimized)) if (!MaybeEntry)

return EntryRef(ShouldBeMinimized, *Entry).unwrapError(); return MaybeEntry.getError();

return minimizeIfNecessary(*MaybeEntry, Filename).unwrapError();

// FIXME: Handle PCM/PCH files.

// FIXME: Handle module map files.

auto &SharedCacheEntry = SharedCache.get(Filename);

{

std::lock_guard<std::mutex> LockGuard(SharedCacheEntry.ValueLock);

CachedFileSystemEntry &CacheEntry = SharedCacheEntry.Value;

if (!CacheEntry.isInitialized()) {

auto MaybeStatus = getUnderlyingFS().status(Filename);

if (!MaybeStatus && !shouldCacheStatFailures(Filename))

// HACK: We need to always restat non source files if the stat fails.

// This is because Clang first looks up the module cache and module

// files before building them, and then looks for them again. If we

// cache the stat failure, it won't see them the second time.

return MaybeStatus.getError();

CacheEntry.init(std::move(MaybeStatus), Filename, getUnderlyingFS());

}

// Checking `needsUpdate` verifies the entry represents an opened file.

// Only checking `needsMinimization` could lead to minimization of files

// that we failed to load (such files don't have `OriginalContents`).

if (CacheEntry.needsUpdate(ShouldBeMinimized))

CacheEntry.minimizeFile();

}

// Store the result in the local cache.

Entry = &SharedCacheEntry.Value;

return EntryRef(ShouldBeMinimized, *Entry).unwrapError();

} }

llvm::ErrorOr<llvm::vfs::Status> llvm::ErrorOr<llvm::vfs::Status>

DependencyScanningWorkerFilesystem::status(const Twine &Path) { DependencyScanningWorkerFilesystem::status(const Twine &Path) {

SmallString<256> OwnedFilename; SmallString<256> OwnedFilename;

StringRef Filename = Path.toStringRef(OwnedFilename); StringRef Filename = Path.toStringRef(OwnedFilename);

llvm::ErrorOr<EntryRef> Result = getOrCreateFileSystemEntry(Filename); llvm::ErrorOr<EntryRef> Result = getOrCreateFileSystemEntry(Filename);

Show All 36 Lines

llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>> MinimizedVFSFile::create( llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>> MinimizedVFSFile::create(

EntryRef Entry, ExcludedPreprocessorDirectiveSkipMapping *PPSkipMappings) { EntryRef Entry, ExcludedPreprocessorDirectiveSkipMapping *PPSkipMappings) {

assert(!Entry.isError() && "error"); assert(!Entry.isError() && "error");

if (Entry.isDirectory()) if (Entry.isDirectory())

return std::make_error_code(std::errc::is_a_directory); return std::make_error_code(std::errc::is_a_directory);

auto Result = std::make_unique<MinimizedVFSFile>( auto Result = std::make_unique<MinimizedVFSFile>(

llvm::MemoryBuffer::getMemBuffer(Entry.getContents(), Entry.getName(), llvm::MemoryBuffer::getMemBuffer(Entry.getContents(),

Entry.getStatus().getName(),

/*RequiresNullTerminator=*/false), /*RequiresNullTerminator=*/false),

Entry.getStatus()); Entry.getStatus());

const auto *EntrySkipMappings = Entry.getPPSkippedRangeMapping(); const auto *EntrySkipMappings = Entry.getPPSkippedRangeMapping();

if (EntrySkipMappings && !EntrySkipMappings->empty() && PPSkipMappings) if (EntrySkipMappings && !EntrySkipMappings->empty() && PPSkipMappings)

(*PPSkipMappings)[Result->Buffer->getBufferStart()] = EntrySkipMappings; (*PPSkipMappings)[Result->Buffer->getBufferStart()] = EntrySkipMappings;

return llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>( return llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>(

Show All 13 Lines

clang/unittests/Tooling/DependencyScannerTest.cpp

Show First 20 Lines • Show All 218 Lines • ▼ Show 20 Lines	TEST(DependencyScanningFilesystem, IgnoredFilesAreCachedSeparately1) {
auto StatusMinimized0 = DepFS.status("/mod.h");		auto StatusMinimized0 = DepFS.status("/mod.h");
DepFS.disableMinimization("/mod.h");		DepFS.disableMinimization("/mod.h");
auto StatusFull1 = DepFS.status("/mod.h");		auto StatusFull1 = DepFS.status("/mod.h");

EXPECT_TRUE(StatusMinimized0);		EXPECT_TRUE(StatusMinimized0);
EXPECT_TRUE(StatusFull1);		EXPECT_TRUE(StatusFull1);
EXPECT_EQ(StatusMinimized0->getSize(), 17u);		EXPECT_EQ(StatusMinimized0->getSize(), 17u);
EXPECT_EQ(StatusFull1->getSize(), 30u);		EXPECT_EQ(StatusFull1->getSize(), 30u);
		EXPECT_EQ(StatusMinimized0->getName(), StringRef("/mod.h"));
		EXPECT_EQ(StatusFull1->getName(), StringRef("/mod.h"));
}		}

TEST(DependencyScanningFilesystem, IgnoredFilesAreCachedSeparately2) {		TEST(DependencyScanningFilesystem, IgnoredFilesAreCachedSeparately2) {
auto VFS = llvm::makeIntrusiveRefCnt<llvm::vfs::InMemoryFileSystem>();		auto VFS = llvm::makeIntrusiveRefCnt<llvm::vfs::InMemoryFileSystem>();
VFS->addFile("/mod.h", 0,		VFS->addFile("/mod.h", 0,
llvm::MemoryBuffer::getMemBuffer("#include <foo.h>\n"		llvm::MemoryBuffer::getMemBuffer("#include <foo.h>\n"
"// hi there!\n"));		"// hi there!\n"));

DependencyScanningFilesystemSharedCache SharedCache;		DependencyScanningFilesystemSharedCache SharedCache;
auto Mappings = std::make_unique<ExcludedPreprocessorDirectiveSkipMapping>();		auto Mappings = std::make_unique<ExcludedPreprocessorDirectiveSkipMapping>();
DependencyScanningWorkerFilesystem DepFS(SharedCache, VFS, Mappings.get());		DependencyScanningWorkerFilesystem DepFS(SharedCache, VFS, Mappings.get());

DepFS.disableMinimization("/mod.h");		DepFS.disableMinimization("/mod.h");
auto StatusFull0 = DepFS.status("/mod.h");		auto StatusFull0 = DepFS.status("/mod.h");
DepFS.enableMinimizationOfAllFiles();		DepFS.enableMinimizationOfAllFiles();
auto StatusMinimized1 = DepFS.status("/mod.h");		auto StatusMinimized1 = DepFS.status("/mod.h");

EXPECT_TRUE(StatusFull0);		EXPECT_TRUE(StatusFull0);
EXPECT_TRUE(StatusMinimized1);		EXPECT_TRUE(StatusMinimized1);
EXPECT_EQ(StatusFull0->getSize(), 30u);		EXPECT_EQ(StatusFull0->getSize(), 30u);
EXPECT_EQ(StatusMinimized1->getSize(), 17u);		EXPECT_EQ(StatusMinimized1->getSize(), 17u);
		EXPECT_EQ(StatusFull0->getName(), StringRef("/mod.h"));
		EXPECT_EQ(StatusMinimized1->getName(), StringRef("/mod.h"));
}		}

} // end namespace dependencies		} // end namespace dependencies
} // end namespace tooling		} // end namespace tooling
} // end namespace clang		} // end namespace clang

This is an archive of the discontinued LLVM Phabricator instance.

[clang][deps] Ensure filesystem cache consistencyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 401934

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp

clang/unittests/Tooling/DependencyScannerTest.cpp

[clang][deps] Ensure filesystem cache consistency
ClosedPublic