This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
-
FileManager.h
-
Tooling/DependencyScanning/
-
DependencyScanning/
5/7
DependencyScanningFilesystem.h
-
DependencyScanningService.h
-
DependencyScanningWorker.h
-
lib/Tooling/DependencyScanning/
-
Tooling/
-
DependencyScanning/
-
CMakeLists.txt
9/13
DependencyScanningFilesystem.cpp
-
DependencyScanningService.cpp
3/6
DependencyScanningWorker.cpp
-
test/ClangScanDeps/
-
ClangScanDeps/
-
Inputs/
-
frameworks/Framework.framework/
-
Framework.framework/
-
Headers/
-
Framework.h
-
PrivateHeaders/
-
PrivateHeader.h
-
header_stat_before_open_cdb.json
-
vfsoverlay.yaml
-
vfsoverlay_cdb.json
-
header_stat_before_open.m
-
regular_cdb.cpp
-
vfsoverlay.cpp
-
tools/clang-scan-deps/
-
clang-scan-deps/
-
ClangScanDeps.cpp

Differential D63907

[clang-scan-deps] Implementation of dependency scanner over minimized sources
ClosedPublic

Authored by arphaman on Jun 27 2019, 5:04 PM.

Download Raw Diff

Details

Reviewers

Bigcheese
aganea

Commits

rGe1f4c4aad279: [clang-scan-deps] Implementation of dependency scanner over minimized sources
rC368086: [clang-scan-deps] Implementation of dependency scanner over minimized sources
rL368086: [clang-scan-deps] Implementation of dependency scanner over minimized sources

Summary

This patch implements the fast dependency scanning mode in clang-scan-deps: the preprocessing is done on files that are minimized using the dependency directives source minimizer.

A shared file system cache is used to ensure that the file system requests and source minimization is performed only once. The cache assumes that the underlying filesystem won't change during the course of the scan (or if it will, it will not affect the output), and it can't be evicted. This means that the service and workers can be used for a single run of a dependency scanner, and can't be reused across multiple, incremental runs. This is something that we'll most likely support in the future though. Note that the driver still utilizes the underlying real filesystem.

This patch is also still missing the fast skipped PP block skipping optimization that I mentioned at EuroLLVM talk.

Diff Detail

Repository: rC Clang

Event Timeline

arphaman created this revision.Jun 27 2019, 5:04 PM

Herald added a project: Restricted Project. · View Herald TranscriptJun 27 2019, 5:04 PM

Herald added subscribers: tschuett, dexonsmith, jkorous, mgorny. · View Herald Transcript

• SamChaps added a subscriber: • SamChaps.Jun 28 2019, 6:27 AM

Fix a bug for empty minimized files where null terminator lookup by the lexer was an out of bounds read

I will take a look next week when I get back!

aganea added subscribers: sammccall, JDevlieghere.Jul 30 2019, 9:13 AM

aganea added inline comments.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
2	General comment for this file and the implementation: it seems there's nothing specific to the dependency scanning, except for replacing the content with minimized content? This cached FS could be very well used to compile a project with parallel workers. Could this be part of the `VirtualFileSystem` infrastructure? ( `llvm/include/llvm/Support/VirtualFileSystem.h`) If yes, can you possibly create a separate patch for this? @JDevlieghere @sammccall
102	The struct is self-explanatory, not sure the comment is needed here?
104	Would you mind renaming this to `ValueLock` so it is easier to search for? (and to remain consistent with `CacheLock` below)
130	Maybe worth mentioning this is NOT a global, thread-safe, class like `DependencyScanningFilesystemSharedCache`, but rather meant to be used as per-thread instances? I am also wondering if there could be a better name to signify at first sight that this is a per-thread class. `DependencyScanningLocalFilesystem`? `DependencyScanningWorkerFilesystem`?
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
85	This line needs a comment. Is this value based on empirical results across different hardwares? (I can't recall your answer at the LLVM conf) I am wondering what would be the good strategy here? The more cores/HT in your PC, the more chances that you'll hit the same shard, thus locking. But the bigger we make this value, the more `StringMaps` we will have, and more cache misses possibly. Should it be something like `llvm::hardware_concurrency() / some_fudge`? It'd be interesting to subsequently profile on high core count machines (or maybe you have done that).
211	Don't use `auto` when the type is not obvious.
clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
148	Can we not use caching all the time?

arphaman updated this revision to Diff 212435.Jul 30 2019, 2:06 PM

arphaman marked 7 inline comments as done.

arphaman added inline comments.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
2	I think it could be possible to separate out the cache, but we definitely need a subclass of a VFS to handle some Clang specific logic for how to determine how to deal with module files for instance. I wouldn't be opposed to factoring it out if people are interested. I think that can be done as a follow-up if there's an interest though.
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
85	I rewrote it to use a heuristic we settled on after doing empirical testing on an 18 core / 36 thread machine ( max(2, thread_count / 4) ), and added a comment. This was the number `9` (36 / 4) I mentioned at the talk, so we only got to it because of a heuristic. I think now after some SDK/OS updates the effect from adding more shards is less pronounced, so it mostly flatlines past some number between 5-10. A better heuristic would probably be OS specific to take the cost of lock contention into account. Note that the increased number of shards does not increase the number of cache misses, because the shard # is determined by the filename (we don't actually have global cache misses, as the cache is designed to have only one miss per file when it's first accessed)! It's just that an increased number of shards doesn't improve performance after hitting some specific limit, so we want to find a point where we can taper it off. It would still be definitely interesting to profile it on other high core machines with different OSes to see if it's a reasonable heuristic for those scenarios too.
clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
148	We want to have a mode where it's as close to the regular `clang -E` invocation as possible for correctness CI and testing. We also haven't evaluated if the cost of keeping non-minimized sources in memory, so it might be too expensive for practical use? I can add a third option that caches but doesn't minimize though as a follow-up patch though

arphaman updated this revision to Diff 212442.Jul 30 2019, 2:29 PM

bruno added a subscriber: bruno.Jul 31 2019, 12:02 PM

aganea added inline comments.Aug 1 2019, 9:26 AM

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
85	I'll give it a try on Windows 10, one project here has a large codebase and some wild machines to test on.
157	This makes only a short-lived objects, is that right? Just during the call to `CachedFileSystemEntry::createFileEntry`?
clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
94	"might be" twice.
148	Yes that would be nice. As for the size taken in RAM, it would be only .H files, right? For Clang+LLVM+LLD I'm counting about 40 MB. But indeed with a large project, that would be about 1.5 GB of .H. Although I doubt all these files will be loaded at once in memory (I'll check) As for the usage: Fastbuild works like distcc (plain mode, not pump) so we were also planning on extracting the fully preprocessed output, not only the dependencies. There's one use-case where we need to preprocess locally, then send the preprocessed output remotely for compilation. And another use-case where we only want to extract the dependency list, compute a digest, to retrieve the OBJ from a network cache.

Fix comment typo

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
157	Yes, these VFS buffer files are only alive for a particular invocation.
clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
148	Right now it doesn't differentiate between .H and other files, but we could teach it do have a header only filter for the cache.

Thanks for the update Alex! Just a few more comments and we should be good to go:

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
118	Why not use a bump allocator here? (it would be per-thread) On Windows the heap is always thread-safe, which induce a lock in `malloc` for each new entry. You could also avoid the usage of `unique_ptr` by the same occasion: `llvm::StringMap<SharedFileSystemEntry, SpecificBumpPtrAllocator<SharedFileSystemEntry>> Cache;` (unless you're planning on removing entries from the cache later on?)
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
77	`llvm::vfs::Status &&Stat` to avoid a copy?
104	`Shard.Cache.try_emplace(Key)` will provide a default constructed value if it's not there.
118	`CachedFileSystemEntry *Entry` ?
199	CachedFileSystemEntry *Entry ?
clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp
148	No worries, I was simply wondering about the size in memory.

Address review comments.

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h
118	Good idea, I switched to a bump pointer allocator (I don't think I can use a specific one, as that would cause the values to be destroyed twice). I'm not planning on removing entries from the cache, no.
clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
77	The copy should already be avoided, as I move the argument when passing in, and when it's assigned to the result.

Just for reference, this patch still doesn't reuse the FileManager across invocations in a thread. We expect to get even better performance once we reuse it, but I'm going have to improve its API first.

LGTM, thank you!

In D63907#1617417, @arphaman wrote:

Just for reference, this patch still doesn't reuse the FileManager across invocations in a thread. We expect to get even better performance once we reuse it, but I'm going have to improve its API first.

Can't wait! @SamChaps is already testing this patch. He found some minor edge-cases with the minimizer (most likely unrelated to this), I'll post a patch.

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
77	If `Stat` is not a rvalue reference, wouldn't the `std::move` in the call site end up as a copy? See this. If you change `int test(A a)` to `int test(A &&a)` you can see the difference in the asm output. However if the function is inlined, the extra copy will probably be optimized out. Not a big deal - the OS calls like stat() will most likely dominate here.

This revision is now accepted and ready to land.Aug 6 2019, 11:48 AM

arphaman marked an inline comment as done.Aug 6 2019, 1:38 PM

arphaman added inline comments.

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp
77	Isn't the difference in the output caused by the details of the calling convention (pass the structure on the stack by value)? The move constructor should still be called for the stat, it will not copy its members. We can optimize this though and pass by ref directly, I agree, so let me do that.

Closed by commit rL368086: [clang-scan-deps] Implementation of dependency scanner over minimized sources (authored by arphaman). · Explain WhyAug 6 2019, 1:43 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptAug 6 2019, 1:43 PM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

FileManager.h

4 lines

Tooling/

DependencyScanning/

DependencyScanningFilesystem.h

163 lines

DependencyScanningService.h

55 lines

DependencyScanningWorker.h

9 lines

lib/

Tooling/

DependencyScanning/

CMakeLists.txt

2 lines

DependencyScanningFilesystem.cpp

232 lines

DependencyScanningService.cpp

16 lines

DependencyScanningWorker.cpp

55 lines

test/

ClangScanDeps/

Inputs/

frameworks/

Framework.framework/

Headers/

Framework.h

2 lines

PrivateHeaders/

PrivateHeader.h

2 lines

header_stat_before_open_cdb.json

7 lines

vfsoverlay.yaml

12 lines

vfsoverlay_cdb.json

7 lines

header_stat_before_open.m

18 lines

regular_cdb.cpp

12 lines

vfsoverlay.cpp

17 lines

tools/

clang-scan-deps/

ClangScanDeps.cpp

24 lines

Diff 206967

clang/include/clang/Basic/FileManager.h

Show First 20 Lines • Show All 217 Lines • ▼ Show 20 Lines	const FileEntry *getFile(StringRef Filename, bool OpenFile = false,
bool CacheFailure = true);		bool CacheFailure = true);

/// Returns the current file system options		/// Returns the current file system options
FileSystemOptions &getFileSystemOpts() { return FileSystemOpts; }		FileSystemOptions &getFileSystemOpts() { return FileSystemOpts; }
const FileSystemOptions &getFileSystemOpts() const { return FileSystemOpts; }		const FileSystemOptions &getFileSystemOpts() const { return FileSystemOpts; }

llvm::vfs::FileSystem &getVirtualFileSystem() const { return *FS; }		llvm::vfs::FileSystem &getVirtualFileSystem() const { return *FS; }

		void setVirtualFileSystem(IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS) {
		this->FS = std::move(FS);
		}

/// Retrieve a file entry for a "virtual" file that acts as		/// Retrieve a file entry for a "virtual" file that acts as
/// if there were a file with the given name on disk.		/// if there were a file with the given name on disk.
///		///
/// The file itself is not accessed.		/// The file itself is not accessed.
const FileEntry *getVirtualFile(StringRef Filename, off_t Size,		const FileEntry *getVirtualFile(StringRef Filename, off_t Size,
time_t ModificationTime);		time_t ModificationTime);

/// Open the specified file as a MemoryBuffer, returning a new		/// Open the specified file as a MemoryBuffer, returning a new
▲ Show 20 Lines • Show All 52 Lines • Show Last 20 Lines

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h

This file was added.

				//===- DependencyScanningFilesystem.h - clang-scan-deps fs ===---- C++ --===//
				//
				aganeaUnsubmitted Not Done Reply Inline Actions General comment for this file and the implementation: it seems there's nothing specific to the dependency scanning, except for replacing the content with minimized content? This cached FS could be very well used to compile a project with parallel workers. Could this be part of the `VirtualFileSystem` infrastructure? ( `llvm/include/llvm/Support/VirtualFileSystem.h`) If yes, can you possibly create a separate patch for this? @JDevlieghere @sammccall aganea: General comment for this file and the implementation: it seems there's nothing specific to the…
				arphamanAuthorUnsubmitted Done Reply Inline Actions I think it could be possible to separate out the cache, but we definitely need a subclass of a VFS to handle some Clang specific logic for how to determine how to deal with module files for instance. I wouldn't be opposed to factoring it out if people are interested. I think that can be done as a follow-up if there's an interest though. arphaman: I think it could be possible to separate out the cache, but we definitely need a subclass of a…
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_FILESYSTEM_H
				#define LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_FILESYSTEM_H

				#include "clang/Basic/LLVM.h"
				#include "llvm/ADT/StringMap.h"
				#include "llvm/ADT/StringSet.h"
				#include "llvm/Support/Allocator.h"
				#include "llvm/Support/ErrorOr.h"
				#include "llvm/Support/VirtualFileSystem.h"
				#include <mutex>

				namespace clang {
				namespace tooling {
				namespace dependencies {

				/// An in-memory representation of a file system entity that is of interest to
				/// the dependency scanning filesystem.
				///
				/// It represents one of the following:
				/// - an opened source file with minimized contents and a stat value.
				/// - an opened source file with original contents and a stat value.
				/// - a directory entry with its stat value.
				/// - an error value to represent a file system error.
				/// - a placeholder with an invalid stat indicating a not yet initialized entry.
				class CachedFileSystemEntry {
				public:
				/// Default constructor creates an entry with an invalid stat.
				CachedFileSystemEntry() : MaybeStat(llvm::vfs::Status()) {}

				CachedFileSystemEntry(std::error_code Error) : MaybeStat(std::move(Error)) {}

				/// Create an entry that represents an opened source file with minimized or
				/// original contents.
				///
				/// The filesystem opens the file even for `stat` calls open to avoid the
				/// issues with stat + open of minimized files that might lead to a
				/// mismatching size of the file. If file is not minimized, the full file is
				/// read and copied into memory to ensure that it's not memory mapped to avoid
				/// running out of file descriptors.
				static CachedFileSystemEntry createFileEntry(StringRef Filename,
				llvm::vfs::FileSystem &FS,
				bool Minimize = true);

				/// Create an entry that represents a directory on the filesystem.
				static CachedFileSystemEntry createDirectoryEntry(llvm::vfs::Status Stat);

				/// \returns True if the entry is valid.
				bool isValid() const { return !MaybeStat \|\| MaybeStat->isStatusKnown(); }

				/// \returns The error or the file's contents.
				llvm::ErrorOr<StringRef> getContents() const {
				if (!MaybeStat)
				return MaybeStat.getError();
				assert(!MaybeStat->isDirectory() && "not a file");
				assert(isValid() && "not initialized");
				return StringRef(Contents);
				}

				/// \returns The error or the status of the entry.
				llvm::ErrorOr<llvm::vfs::Status> getStatus() const {
				assert(isValid() && "not initialized");
				return MaybeStat;
				}

				/// \returns the name of the file.
				StringRef getName() const {
				assert(isValid() && "not initialized");
				return MaybeStat->getName();
				}

				CachedFileSystemEntry(CachedFileSystemEntry &&) = default;
				CachedFileSystemEntry &operator=(CachedFileSystemEntry &&) = default;

				CachedFileSystemEntry(const CachedFileSystemEntry &) = delete;
				CachedFileSystemEntry &operator=(const CachedFileSystemEntry &) = delete;

				private:
				llvm::ErrorOr<llvm::vfs::Status> MaybeStat;
				// Store the contents in a small string to allowed a
				// move from the small string for the minimized contents.
				llvm::SmallString<0> Contents;
				};

				/// This class is a shared cache, that caches the 'stat' and 'open' calls to the
				/// underlying real file system.
				///
				/// It is sharded based on the hash of the key to reduce the lock contention for
				/// the worker threads.
				class DependencyScanningFilesystemSharedCache {
				public:
				/// A \c CachedFileSystemEntry with a lock.
				struct SharedFileSystemEntry {
				std::mutex Lock;
				CachedFileSystemEntry Value;
				aganeaUnsubmitted Done Reply Inline Actions The struct is self-explanatory, not sure the comment is needed here? aganea: The struct is self-explanatory, not sure the comment is needed here?
				};

				aganeaUnsubmitted Done Reply Inline Actions Would you mind renaming this to `ValueLock` so it is easier to search for? (and to remain consistent with `CacheLock` below) aganea: Would you mind renaming this to `ValueLock` so it is easier to search for? (and to remain…
				DependencyScanningFilesystemSharedCache();

				/// Returns a cache entry for the corresponding key.
				///
				/// A new cache entry is created if the key is not in the cache. This is a
				/// thread safe call.
				SharedFileSystemEntry &get(StringRef Key);

				private:
				struct CacheShard {
				std::mutex CacheLock;
				llvm::StringMap<std::unique_ptr<SharedFileSystemEntry>> Cache;
				};
				std::unique_ptr<CacheShard[]> CacheShards;
				aganeaUnsubmitted Not Done Reply Inline Actions Why not use a bump allocator here? (it would be per-thread) On Windows the heap is always thread-safe, which induce a lock in `malloc` for each new entry. You could also avoid the usage of `unique_ptr` by the same occasion: `llvm::StringMap<SharedFileSystemEntry, SpecificBumpPtrAllocator<SharedFileSystemEntry>> Cache;` (unless you're planning on removing entries from the cache later on?) aganea: Why not use a bump allocator here? (it would be per-thread) On Windows the heap is always…
				arphamanAuthorUnsubmitted Done Reply Inline Actions Good idea, I switched to a bump pointer allocator (I don't think I can use a specific one, as that would cause the values to be destroyed twice). I'm not planning on removing entries from the cache, no. arphaman: Good idea, I switched to a bump pointer allocator (I don't think I can use a specific one, as…
				unsigned NumShards;
				};

				/// A virtual file system optimized for the dependency discovery.
				///
				/// It is primarily designed to work with source files whose contents was was
				/// preprocessed to remove any tokens that are unlikely to affect the dependency
				/// computation.
				class DependencyScanningFilesystem : public llvm::vfs::ProxyFileSystem {
				public:
				DependencyScanningFilesystem(
				DependencyScanningFilesystemSharedCache &SharedCache,
				aganeaUnsubmitted Done Reply Inline Actions Maybe worth mentioning this is NOT a global, thread-safe, class like `DependencyScanningFilesystemSharedCache`, but rather meant to be used as per-thread instances? I am also wondering if there could be a better name to signify at first sight that this is a per-thread class. `DependencyScanningLocalFilesystem`? `DependencyScanningWorkerFilesystem`? aganea: Maybe worth mentioning this is NOT a global, thread-safe, class like…
				IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS)
				: ProxyFileSystem(std::move(FS)), SharedCache(SharedCache) {}

				llvm::ErrorOr<llvm::vfs::Status> status(const Twine &Path) override;
				llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
				openFileForRead(const Twine &Path) override;

				/// The set of files that should not be minimized.
				llvm::StringSet<> IgnoredFiles;

				private:
				void setCachedEntry(StringRef Filename, const CachedFileSystemEntry *Entry) {
				bool IsInserted = Cache.try_emplace(Filename, Entry).second;
				(void)IsInserted;
				assert(IsInserted && "local cache is updated more than once");
				}

				const CachedFileSystemEntry *getCachedEntry(StringRef Filename) {
				auto It = Cache.find(Filename);
				return It == Cache.end() ? nullptr : It->getValue();
				}

				DependencyScanningFilesystemSharedCache &SharedCache;
				/// The local cache is used by the worker thread to cache file system queries
				/// locally instead of querying the global cache every time.
				llvm::StringMap<const CachedFileSystemEntry *, llvm::BumpPtrAllocator> Cache;
				};

				} // end namespace dependencies
				} // end namespace tooling
				} // end namespace clang

				#endif // LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_FILESYSTEM_H

clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h

This file was added.

				//===- DependencyScanningService.h - clang-scan-deps service ===-- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_SERVICE_H
				#define LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_SERVICE_H

				#include "clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h"

				namespace clang {
				namespace tooling {
				namespace dependencies {

				/// The mode in which the dependency scanner will operate to find the
				/// dependencies.
				enum class ScanningMode {
				/// This mode is used to compute the dependencies by running the preprocessor
				/// over
				/// the unmodified source files.
				CanonicalPreprocessing,

				/// This mode is used to compute the dependencies by running the preprocessor
				/// over
				/// the source files that have been minimized to contents that might affect
				/// the dependencies.
				MinimizedSourcePreprocessing
				};

				/// The dependency scanning service contains the shared state that is used by
				/// the invidual dependency scanning workers.
				class DependencyScanningService {
				public:
				DependencyScanningService(ScanningMode Mode);

				ScanningMode getMode() const { return Mode; }

				DependencyScanningFilesystemSharedCache &getSharedCache() {
				return SharedCache;
				}

				private:
				const ScanningMode Mode;
				/// The global file system cache.
				DependencyScanningFilesystemSharedCache SharedCache;
				};

				} // end namespace dependencies
				} // end namespace tooling
				} // end namespace clang

				#endif // LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_SERVICE_H

clang/include/clang/Tooling/DependencyScanning/DependencyScanningWorker.h

	//===- DependencyScanningWorker.h - clang-scan-deps worker ===---- C++ --===//			//===- DependencyScanningWorker.h - clang-scan-deps worker ===---- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_WORKER_H			#ifndef LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_WORKER_H
	#define LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_WORKER_H			#define LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_WORKER_H

	#include "clang/Basic/DiagnosticOptions.h"			#include "clang/Basic/DiagnosticOptions.h"
				#include "clang/Basic/FileManager.h"
	#include "clang/Basic/LLVM.h"			#include "clang/Basic/LLVM.h"
	#include "clang/Frontend/PCHContainerOperations.h"			#include "clang/Frontend/PCHContainerOperations.h"
	#include "clang/Tooling/CompilationDatabase.h"			#include "clang/Tooling/CompilationDatabase.h"
	#include "llvm/Support/Error.h"			#include "llvm/Support/Error.h"
	#include "llvm/Support/FileSystem.h"			#include "llvm/Support/FileSystem.h"
	#include <string>			#include <string>

	namespace clang {			namespace clang {
	namespace tooling {			namespace tooling {
	namespace dependencies {			namespace dependencies {

				class DependencyScanningService;
				class DependencyScanningFilesystem;

	/// An individual dependency scanning worker that is able to run on its own			/// An individual dependency scanning worker that is able to run on its own
	/// thread.			/// thread.
	///			///
	/// The worker computes the dependencies for the input files by preprocessing			/// The worker computes the dependencies for the input files by preprocessing
	/// sources either using a fast mode where the source files are minimized, or			/// sources either using a fast mode where the source files are minimized, or
	/// using the regular processing run.			/// using the regular processing run.
	class DependencyScanningWorker {			class DependencyScanningWorker {
	public:			public:
	DependencyScanningWorker();			DependencyScanningWorker(DependencyScanningService &Service);

	/// Print out the dependency information into a string using the dependency			/// Print out the dependency information into a string using the dependency
	/// file format that is specified in the options (-MD is the default) and			/// file format that is specified in the options (-MD is the default) and
	/// return it.			/// return it.
	///			///
	/// \returns A \c StringError with the diagnostic output if clang errors			/// \returns A \c StringError with the diagnostic output if clang errors
	/// occurred, dependency file contents otherwise.			/// occurred, dependency file contents otherwise.
	llvm::Expected<std::string> getDependencyFile(const std::string &Input,			llvm::Expected<std::string> getDependencyFile(const std::string &Input,
	StringRef WorkingDirectory,			StringRef WorkingDirectory,
	const CompilationDatabase &CDB);			const CompilationDatabase &CDB);

	private:			private:
	IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts;			IntrusiveRefCntPtr<DiagnosticOptions> DiagOpts;
	std::shared_ptr<PCHContainerOperations> PCHContainerOps;			std::shared_ptr<PCHContainerOperations> PCHContainerOps;

				llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> RealFS;
	/// The file system that is used by each worker when scanning for			/// The file system that is used by each worker when scanning for
	/// dependencies. This filesystem persists accross multiple compiler			/// dependencies. This filesystem persists accross multiple compiler
	/// invocations.			/// invocations.
	llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> WorkerFS;			llvm::IntrusiveRefCntPtr<DependencyScanningFilesystem> DepFS;
	};			};

	} // end namespace dependencies			} // end namespace dependencies
	} // end namespace tooling			} // end namespace tooling
	} // end namespace clang			} // end namespace clang

	#endif // LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_WORKER_H			#endif // LLVM_CLANG_TOOLING_DEPENDENCY_SCANNING_WORKER_H

clang/lib/Tooling/DependencyScanning/CMakeLists.txt

	set(LLVM_LINK_COMPONENTS			set(LLVM_LINK_COMPONENTS
	Core			Core
	Support			Support
	)			)

	add_clang_library(clangDependencyScanning			add_clang_library(clangDependencyScanning
				DependencyScanningFilesystem.cpp
				DependencyScanningService.cpp
	DependencyScanningWorker.cpp			DependencyScanningWorker.cpp

	DEPENDS			DEPENDS
	ClangDriverOptions			ClangDriverOptions

	LINK_LIBS			LINK_LIBS
	clangAST			clangAST
	clangBasic			clangBasic
	clangDriver			clangDriver
	clangFrontend			clangFrontend
	clangFrontendTool			clangFrontendTool
	clangLex			clangLex
	clangParse			clangParse
	clangSerialization			clangSerialization
	clangTooling			clangTooling
	)			)

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp

This file was added.

				//===- DependencyScanningFilesystem.cpp - clang-scan-deps fs --------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h"
				#include "clang/Lex/DependencyDirectivesSourceMinimizer.h"
				#include "llvm/Support/MemoryBuffer.h"

				using namespace clang;
				using namespace tooling;
				using namespace dependencies;

				CachedFileSystemEntry CachedFileSystemEntry::createFileEntry(
				StringRef Filename, llvm::vfs::FileSystem &FS, bool Minimize) {
				// Load the file and its content from the file system.
				llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>> MaybeFile =
				FS.openFileForRead(Filename);
				if (!MaybeFile)
				return MaybeFile.getError();
				llvm::ErrorOr<llvm::vfs::Status> Stat = (*MaybeFile)->status();
				if (!Stat)
				return Stat.getError();

				llvm::vfs::File &F = **MaybeFile;
				llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> MaybeBuffer =
				F.getBuffer(Stat->getName());
				if (!MaybeBuffer)
				return MaybeBuffer.getError();

				llvm::SmallString<1024> MinimizedFileContents;
				unsigned SmallCapacity = MinimizedFileContents.capacity();
				// Minimize the file down to directives that might affect the dependencies.
				const auto &Buffer = *MaybeBuffer;
				SmallVector<minimize_source_to_dependency_directives::Token, 64> Tokens;
				if (!Minimize \|\| minimizeSourceToDependencyDirectives(
				Buffer->getBuffer(), MinimizedFileContents, Tokens)) {
				// Use the original file unless requested otherwise, or
				// if the minimization failed.
				// FIXME: Propage the diagnostic if desired by the client.
				CachedFileSystemEntry Result;
				Result.MaybeStat = std::move(*Stat);
				Result.Contents.reserve(Buffer->getBufferSize() + 1);
				Result.Contents.append(Buffer->getBufferStart(), Buffer->getBufferEnd());
				// Null terminate the contents.
				Result.Contents.push_back('\0');
				Result.Contents.pop_back();
				return Result;
				}

				CachedFileSystemEntry Result;
				size_t Size = MinimizedFileContents.size();
				Result.MaybeStat = llvm::vfs::Status(Stat->getName(), Stat->getUniqueID(),
				Stat->getLastModificationTime(),
				Stat->getUser(), Stat->getGroup(), Size,
				Stat->getType(), Stat->getPermissions());
				// The contents produced by the minimizer must be null terminated.
				assert(MinimizedFileContents.data()[MinimizedFileContents.size()] == '\0' &&
				"not null terminated contents");
				if (MinimizedFileContents.capacity() <= SmallCapacity) {
				// The move will copy and not preserve the null terminator. Let's do
				// a manual copy instead.
				Result.Contents.reserve(MinimizedFileContents.size() + 1);
				Result.Contents.append(MinimizedFileContents.begin(),
				MinimizedFileContents.end());
				// Null terminate the contents.
				Result.Contents.push_back('\0');
				Result.Contents.pop_back();
				} else {
				// The move will preserve the null terminator.
				Result.Contents = std::move(MinimizedFileContents);
				}
				return Result;
				}
				aganeaUnsubmitted Done Reply Inline Actions `llvm::vfs::Status &&Stat` to avoid a copy? aganea: `llvm::vfs::Status &&Stat` to avoid a copy?
				arphamanAuthorUnsubmitted Done Reply Inline Actions The copy should already be avoided, as I move the argument when passing in, and when it's assigned to the result. arphaman: The copy should already be avoided, as I move the argument when passing in, and when it's…
				aganeaUnsubmitted Not Done Reply Inline Actions If `Stat` is not a rvalue reference, wouldn't the `std::move` in the call site end up as a copy? See this. If you change `int test(A a)` to `int test(A &&a)` you can see the difference in the asm output. However if the function is inlined, the extra copy will probably be optimized out. Not a big deal - the OS calls like stat() will most likely dominate here. aganea: If `Stat` is not a rvalue reference, wouldn't the `std::move` in the call site end up as a copy?
				arphamanAuthorUnsubmitted Done Reply Inline Actions Isn't the difference in the output caused by the details of the calling convention (pass the structure on the stack by value)? The move constructor should still be called for the stat, it will not copy its members. We can optimize this though and pass by ref directly, I agree, so let me do that. arphaman: Isn't the difference in the output caused by the details of the calling convention (pass the…

				CachedFileSystemEntry
				CachedFileSystemEntry::createDirectoryEntry(llvm::vfs::Status Stat) {
				assert(Stat.isDirectory() && "not a directory!");
				auto Result = CachedFileSystemEntry();
				Result.MaybeStat = std::move(Stat);
				return Result;
				}
				aganeaUnsubmitted Not Done Reply Inline Actions This line needs a comment. Is this value based on empirical results across different hardwares? (I can't recall your answer at the LLVM conf) I am wondering what would be the good strategy here? The more cores/HT in your PC, the more chances that you'll hit the same shard, thus locking. But the bigger we make this value, the more `StringMaps` we will have, and more cache misses possibly. Should it be something like `llvm::hardware_concurrency() / some_fudge`? It'd be interesting to subsequently profile on high core count machines (or maybe you have done that). aganea: This line needs a comment. Is this value based on empirical results across different hardwares?
				arphamanAuthorUnsubmitted Done Reply Inline Actions I rewrote it to use a heuristic we settled on after doing empirical testing on an 18 core / 36 thread machine ( max(2, thread_count / 4) ), and added a comment. This was the number `9` (36 / 4) I mentioned at the talk, so we only got to it because of a heuristic. I think now after some SDK/OS updates the effect from adding more shards is less pronounced, so it mostly flatlines past some number between 5-10. A better heuristic would probably be OS specific to take the cost of lock contention into account. Note that the increased number of shards does not increase the number of cache misses, because the shard # is determined by the filename (we don't actually have global cache misses, as the cache is designed to have only one miss per file when it's first accessed)! It's just that an increased number of shards doesn't improve performance after hitting some specific limit, so we want to find a point where we can taper it off. It would still be definitely interesting to profile it on other high core machines with different OSes to see if it's a reasonable heuristic for those scenarios too. arphaman: I rewrote it to use a heuristic we settled on after doing empirical testing on an 18 core / 36…
				aganeaUnsubmitted Not Done Reply Inline Actions I'll give it a try on Windows 10, one project here has a large codebase and some wild machines to test on. aganea: I'll give it a try on Windows 10, one project here has a large codebase and some wild machines…

				DependencyScanningFilesystemSharedCache::
				DependencyScanningFilesystemSharedCache() {
				NumShards = 8;
				CacheShards = llvm::make_unique<CacheShard[]>(NumShards);
				}

				/// Returns a cache entry for the corresponding key.
				///
				/// A new cache entry is created if the key is not in the cache. This is a
				/// thread safe call.
				DependencyScanningFilesystemSharedCache::SharedFileSystemEntry &
				DependencyScanningFilesystemSharedCache::get(StringRef Key) {
				CacheShard &Shard = CacheShards[llvm::hash_value(Key) % NumShards];
				std::unique_lock<std::mutex> LockGuard(Shard.CacheLock);
				auto It =
				Shard.Cache.try_emplace(Key, std::unique_ptr<SharedFileSystemEntry>());
				auto &Ptr = It.first->getValue();
				// Create the actual cache entry if insert succeeded.
				aganeaUnsubmitted Done Reply Inline Actions `Shard.Cache.try_emplace(Key)` will provide a default constructed value if it's not there. aganea: `Shard.Cache.try_emplace(Key)` will provide a default constructed value if it's not there.
				if (It.second)
				Ptr = llvm::make_unique<SharedFileSystemEntry>();
				return *Ptr;
				}

				llvm::ErrorOr<llvm::vfs::Status>
				DependencyScanningFilesystem::status(const Twine &Path) {
				std::string OwnedFilename;
				StringRef Filename;
				if (Path.isSingleStringRef()) {
				Filename = Path.getSingleStringRef();
				} else {
				OwnedFilename = Path.str();
				Filename = OwnedFilename;
				aganeaUnsubmitted Done Reply Inline Actions `CachedFileSystemEntry Entry` ? aganea:* `CachedFileSystemEntry *Entry` ?
				}

				// Check the local cache first.
				if (const auto *Entry = getCachedEntry(Filename))
				return Entry->getStatus();

				// FIXME: Handle PCM/PCH files.
				// FIXME: Handle module map files.

				bool KeepOriginalSource = IgnoredFiles.count(Filename);
				auto &SharedCacheEntry = SharedCache.get(Filename);
				const CachedFileSystemEntry *Result;
				{
				std::unique_lock<std::mutex> LockGuard(SharedCacheEntry.Lock);
				CachedFileSystemEntry &CacheEntry = SharedCacheEntry.Value;

				if (!CacheEntry.isValid()) {
				llvm::vfs::FileSystem &FS = getUnderlyingFS();
				auto MaybeStatus = FS.status(Filename);
				if (!MaybeStatus)
				CacheEntry = CachedFileSystemEntry(MaybeStatus.getError());
				else if (MaybeStatus->isDirectory())
				CacheEntry = CachedFileSystemEntry::createDirectoryEntry(
				std::move(*MaybeStatus));
				else
				CacheEntry = CachedFileSystemEntry::createFileEntry(
				Filename, FS, !KeepOriginalSource);
				}

				Result = &CacheEntry;
				}

				// Store the result in the local cache.
				setCachedEntry(Filename, Result);
				return Result->getStatus();
				}

				namespace {

				aganeaUnsubmitted Not Done Reply Inline Actions This makes only a short-lived objects, is that right? Just during the call to `CachedFileSystemEntry::createFileEntry`? aganea: This makes only a short-lived objects, is that right? Just during the call to…
				arphamanAuthorUnsubmitted Done Reply Inline Actions Yes, these VFS buffer files are only alive for a particular invocation. arphaman: Yes, these VFS buffer files are only alive for a particular invocation.
				/// The VFS that is used by clang consumes the \c CachedFileSystemEntry using
				/// this subclass.
				class MinimizedVFSFile final : public llvm::vfs::File {
				public:
				MinimizedVFSFile(std::unique_ptr<llvm::MemoryBuffer> Buffer,
				llvm::vfs::Status Stat)
				: Buffer(std::move(Buffer)), Stat(std::move(Stat)) {}

				llvm::ErrorOr<llvm::vfs::Status> status() override { return Stat; }

				const llvm::MemoryBuffer *getBufferPtr() const { return Buffer.get(); }

				llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>>
				getBuffer(const Twine &Name, int64_t FileSize, bool RequiresNullTerminator,
				bool IsVolatile) override {
				return std::move(Buffer);
				}

				std::error_code close() override { return {}; }

				private:
				std::unique_ptr<llvm::MemoryBuffer> Buffer;
				llvm::vfs::Status Stat;
				};

				llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
				createFile(const CachedFileSystemEntry *Entry) {
				llvm::ErrorOr<StringRef> Contents = Entry->getContents();
				if (!Contents)
				return Contents.getError();
				return llvm::make_unique<MinimizedVFSFile>(
				llvm::MemoryBuffer::getMemBuffer(*Contents, Entry->getName(),
				/RequiresNullTerminator=/false),
				*Entry->getStatus());
				}

				} // end anonymous namespace

				llvm::ErrorOr<std::unique_ptr<llvm::vfs::File>>
				DependencyScanningFilesystem::openFileForRead(const Twine &Path) {
				std::string OwnedFilename;
				StringRef Filename;
				aganeaUnsubmitted Done Reply Inline Actions CachedFileSystemEntry Entry ? aganea:* CachedFileSystemEntry *Entry ?
				if (Path.isSingleStringRef()) {
				Filename = Path.getSingleStringRef();
				} else {
				OwnedFilename = Path.str();
				Filename = OwnedFilename;
				}

				// Check the local cache first.
				if (const auto *Entry = getCachedEntry(Filename))
				return createFile(Entry);

				// FIXME: Handle PCM/PCH files.
				aganeaUnsubmitted Done Reply Inline Actions Don't use `auto` when the type is not obvious. aganea: Don't use `auto` when the type is not obvious.
				// FIXME: Handle module map files.

				bool KeepOriginalSource = IgnoredFiles.count(Filename);
				auto &SharedCacheEntry = SharedCache.get(Filename);
				const CachedFileSystemEntry *Result;
				{
				std::unique_lock<std::mutex> LockGuard(SharedCacheEntry.Lock);
				CachedFileSystemEntry &CacheEntry = SharedCacheEntry.Value;

				if (!CacheEntry.isValid()) {
				CacheEntry = CachedFileSystemEntry::createFileEntry(
				Filename, getUnderlyingFS(), !KeepOriginalSource);
				}

				Result = &CacheEntry;
				}

				// Store the result in the local cache.
				setCachedEntry(Filename, Result);
				return createFile(Result);
				}

clang/lib/Tooling/DependencyScanning/DependencyScanningService.cpp

This file was added.

				//===- DependencyScanningService.cpp - clang-scan-deps service ------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//

				#include "clang/Tooling/DependencyScanning/DependencyScanningService.h"

				using namespace clang;
				using namespace tooling;
				using namespace dependencies;

				DependencyScanningService::DependencyScanningService(ScanningMode Mode)
				: Mode(Mode) {}

clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp

	//===- DependencyScanningWorker.cpp - clang-scan-deps worker --------------===//			//===- DependencyScanningWorker.cpp - clang-scan-deps worker --------------===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "clang/Tooling/DependencyScanning/DependencyScanningWorker.h"			#include "clang/Tooling/DependencyScanning/DependencyScanningWorker.h"
	#include "clang/Frontend/CompilerInstance.h"			#include "clang/Frontend/CompilerInstance.h"
				#include "clang/Frontend/CompilerInvocation.h"
	#include "clang/Frontend/FrontendActions.h"			#include "clang/Frontend/FrontendActions.h"
	#include "clang/Frontend/TextDiagnosticPrinter.h"			#include "clang/Frontend/TextDiagnosticPrinter.h"
	#include "clang/Frontend/Utils.h"			#include "clang/Frontend/Utils.h"
				#include "clang/Tooling/DependencyScanning/DependencyScanningService.h"
	#include "clang/Tooling/Tooling.h"			#include "clang/Tooling/Tooling.h"

	using namespace clang;			using namespace clang;
	using namespace tooling;			using namespace tooling;
	using namespace dependencies;			using namespace dependencies;

	namespace {			namespace {

	Show All 35 Lines
	private:			private:
	std::string CWD;			std::string CWD;
	};			};

	/// A clang tool that runs the preprocessor in a mode that's optimized for			/// A clang tool that runs the preprocessor in a mode that's optimized for
	/// dependency scanning for the given compiler invocation.			/// dependency scanning for the given compiler invocation.
	class DependencyScanningAction : public tooling::ToolAction {			class DependencyScanningAction : public tooling::ToolAction {
	public:			public:
	DependencyScanningAction(StringRef WorkingDirectory,			DependencyScanningAction(
	std::string &DependencyFileContents)			StringRef WorkingDirectory, std::string &DependencyFileContents,
				llvm::IntrusiveRefCntPtr<DependencyScanningFilesystem> DepFS)
	: WorkingDirectory(WorkingDirectory),			: WorkingDirectory(WorkingDirectory),
	DependencyFileContents(DependencyFileContents) {}			DependencyFileContents(DependencyFileContents),
				DepFS(std::move(DepFS)) {}

	bool runInvocation(std::shared_ptr<CompilerInvocation> Invocation,			bool runInvocation(std::shared_ptr<CompilerInvocation> Invocation,
	FileManager *FileMgr,			FileManager *FileMgr,
	std::shared_ptr<PCHContainerOperations> PCHContainerOps,			std::shared_ptr<PCHContainerOperations> PCHContainerOps,
	DiagnosticConsumer *DiagConsumer) override {			DiagnosticConsumer *DiagConsumer) override {
	// Create a compiler instance to handle the actual work.			// Create a compiler instance to handle the actual work.
	CompilerInstance Compiler(std::move(PCHContainerOps));			CompilerInstance Compiler(std::move(PCHContainerOps));
	Compiler.setInvocation(std::move(Invocation));			Compiler.setInvocation(std::move(Invocation));
	FileMgr->getFileSystemOpts().WorkingDir = WorkingDirectory;
	Compiler.setFileManager(FileMgr);

	// Don't print 'X warnings and Y errors generated'.			// Don't print 'X warnings and Y errors generated'.
	Compiler.getDiagnosticOpts().ShowCarets = false;			Compiler.getDiagnosticOpts().ShowCarets = false;
	// Create the compiler's actual diagnostics engine.			// Create the compiler's actual diagnostics engine.
	Compiler.createDiagnostics(DiagConsumer, /ShouldOwnClient=/false);			Compiler.createDiagnostics(DiagConsumer, /ShouldOwnClient=/false);
	if (!Compiler.hasDiagnostics())			if (!Compiler.hasDiagnostics())
	return false;			return false;

				// Use the dependency scanning optimized file system if we can.
				if (DepFS) {
				// FIXME: Purge the symlink entries from the stat cache in the FM.
				const CompilerInvocation &CI = Compiler.getInvocation();
				// Add any filenames that were explicity passed in the build settings and
				// that might be might be opened, as we want to ensure we don't run source
				aganeaUnsubmitted Done Reply Inline Actions "might be" twice. aganea: "might be" twice.
				// minimization on them.
				DepFS->IgnoredFiles.clear();
				for (const auto &Entry : CI.getHeaderSearchOpts().UserEntries)
				DepFS->IgnoredFiles.insert(Entry.Path);
				for (const auto &Entry : CI.getHeaderSearchOpts().VFSOverlayFiles)
				DepFS->IgnoredFiles.insert(Entry);

				// Support for virtual file system overlays on top of the caching
				// filesystem.
				FileMgr->setVirtualFileSystem(createVFSFromCompilerInvocation(
				CI, Compiler.getDiagnostics(), DepFS));
				}

				FileMgr->getFileSystemOpts().WorkingDir = WorkingDirectory;
				Compiler.setFileManager(FileMgr);
	Compiler.createSourceManager(*FileMgr);			Compiler.createSourceManager(*FileMgr);

	// Create the dependency collector that will collect the produced			// Create the dependency collector that will collect the produced
	// dependencies.			// dependencies.
	//			//
	// This also moves the existing dependency output options from the			// This also moves the existing dependency output options from the
	// invocation to the collector. The options in the invocation are reset,			// invocation to the collector. The options in the invocation are reset,
	// which ensures that the compiler won't create new dependency collectors,			// which ensures that the compiler won't create new dependency collectors,
	// and thus won't write out the extra '.d' files to disk.			// and thus won't write out the extra '.d' files to disk.
	auto Opts = llvm::make_unique<DependencyOutputOptions>(			auto Opts = llvm::make_unique<DependencyOutputOptions>(
	std::move(Compiler.getInvocation().getDependencyOutputOpts()));			std::move(Compiler.getInvocation().getDependencyOutputOpts()));
	// We need at least one -MT equivalent for the generator to work.			// We need at least one -MT equivalent for the generator to work.
	if (Opts->Targets.empty())			if (Opts->Targets.empty())
	Opts->Targets = {"clang-scan-deps dependency"};			Opts->Targets = {"clang-scan-deps dependency"};
	Compiler.addDependencyCollector(std::make_shared<DependencyPrinter>(			Compiler.addDependencyCollector(std::make_shared<DependencyPrinter>(
	std::move(Opts), DependencyFileContents));			std::move(Opts), DependencyFileContents));

	auto Action = llvm::make_unique<PreprocessOnlyAction>();			auto Action = llvm::make_unique<PreprocessOnlyAction>();
	const bool Result = Compiler.ExecuteAction(*Action);			const bool Result = Compiler.ExecuteAction(*Action);
				if (!DepFS)
	FileMgr->clearStatCache();			FileMgr->clearStatCache();
	return Result;			return Result;
	}			}

	private:			private:
	StringRef WorkingDirectory;			StringRef WorkingDirectory;
	/// The dependency file will be written to this string.			/// The dependency file will be written to this string.
	std::string &DependencyFileContents;			std::string &DependencyFileContents;
				llvm::IntrusiveRefCntPtr<DependencyScanningFilesystem> DepFS;
	};			};

	} // end anonymous namespace			} // end anonymous namespace

	DependencyScanningWorker::DependencyScanningWorker() {			DependencyScanningWorker::DependencyScanningWorker(
				DependencyScanningService &Service) {
	DiagOpts = new DiagnosticOptions();			DiagOpts = new DiagnosticOptions();
	PCHContainerOps = std::make_shared<PCHContainerOperations>();			PCHContainerOps = std::make_shared<PCHContainerOperations>();
	/// FIXME: Use the shared file system from the service for fast scanning			RealFS = new ProxyFileSystemWithoutChdir(llvm::vfs::getRealFileSystem());
	/// mode.			if (Service.getMode() == ScanningMode::MinimizedSourcePreprocessing)
				aganeaUnsubmitted Not Done Reply Inline Actions Can we not use caching all the time? aganea: Can we not use caching all the time?
				arphamanAuthorUnsubmitted Done Reply Inline Actions We want to have a mode where it's as close to the regular `clang -E` invocation as possible for correctness CI and testing. We also haven't evaluated if the cost of keeping non-minimized sources in memory, so it might be too expensive for practical use? I can add a third option that caches but doesn't minimize though as a follow-up patch though arphaman: We want to have a mode where it's as close to the regular `clang -E` invocation as possible for…
				aganeaUnsubmitted Not Done Reply Inline Actions Yes that would be nice. As for the size taken in RAM, it would be only .H files, right? For Clang+LLVM+LLD I'm counting about 40 MB. But indeed with a large project, that would be about 1.5 GB of .H. Although I doubt all these files will be loaded at once in memory (I'll check) As for the usage: Fastbuild works like distcc (plain mode, not pump) so we were also planning on extracting the fully preprocessed output, not only the dependencies. There's one use-case where we need to preprocess locally, then send the preprocessed output remotely for compilation. And another use-case where we only want to extract the dependency list, compute a digest, to retrieve the OBJ from a network cache. aganea: Yes that would be nice. As for the size taken in RAM, it would be only .H files, right? For…
				arphamanAuthorUnsubmitted Done Reply Inline Actions Right now it doesn't differentiate between .H and other files, but we could teach it do have a header only filter for the cache. arphaman: Right now it doesn't differentiate between .H and other files, but we could teach it do have a…
				aganeaUnsubmitted Not Done Reply Inline Actions No worries, I was simply wondering about the size in memory. aganea: No worries, I was simply wondering about the size in memory.
	WorkerFS = new ProxyFileSystemWithoutChdir(llvm::vfs::getRealFileSystem());			DepFS = new DependencyScanningFilesystem(Service.getSharedCache(), RealFS);
	}			}

	llvm::Expected<std::string>			llvm::Expected<std::string>
	DependencyScanningWorker::getDependencyFile(const std::string &Input,			DependencyScanningWorker::getDependencyFile(const std::string &Input,
	StringRef WorkingDirectory,			StringRef WorkingDirectory,
	const CompilationDatabase &CDB) {			const CompilationDatabase &CDB) {
	// Capture the emitted diagnostics and report them to the client			// Capture the emitted diagnostics and report them to the client
	// in the case of a failure.			// in the case of a failure.
	std::string DiagnosticOutput;			std::string DiagnosticOutput;
	llvm::raw_string_ostream DiagnosticsOS(DiagnosticOutput);			llvm::raw_string_ostream DiagnosticsOS(DiagnosticOutput);
	TextDiagnosticPrinter DiagPrinter(DiagnosticsOS, DiagOpts.get());			TextDiagnosticPrinter DiagPrinter(DiagnosticsOS, DiagOpts.get());

	WorkerFS->setCurrentWorkingDirectory(WorkingDirectory);			RealFS->setCurrentWorkingDirectory(WorkingDirectory);
	tooling::ClangTool Tool(CDB, Input, PCHContainerOps, WorkerFS);			/// Create the tool that uses the underlying file system to ensure that any
				/// file system requests that are made by the driver do not go through the
				/// dependency scanning filesystem.
				tooling::ClangTool Tool(CDB, Input, PCHContainerOps, RealFS);
	Tool.clearArgumentsAdjusters();			Tool.clearArgumentsAdjusters();
	Tool.setRestoreWorkingDir(false);			Tool.setRestoreWorkingDir(false);
	Tool.setPrintErrorMessage(false);			Tool.setPrintErrorMessage(false);
	Tool.setDiagnosticConsumer(&DiagPrinter);			Tool.setDiagnosticConsumer(&DiagPrinter);
	std::string Output;			std::string Output;
	DependencyScanningAction Action(WorkingDirectory, Output);			DependencyScanningAction Action(WorkingDirectory, Output, DepFS);
	if (Tool.run(&Action)) {			if (Tool.run(&Action)) {
	return llvm::make_error<llvm::StringError>(DiagnosticsOS.str(),			return llvm::make_error<llvm::StringError>(DiagnosticsOS.str(),
	llvm::inconvertibleErrorCode());			llvm::inconvertibleErrorCode());
	}			}
	return Output;			return Output;
	}			}

clang/test/ClangScanDeps/Inputs/frameworks/Framework.framework/Headers/Framework.h

This file was added.

				// This comment is stripped, so size is changed when file is opened
				#define FRAMEWORK 0

clang/test/ClangScanDeps/Inputs/frameworks/Framework.framework/PrivateHeaders/PrivateHeader.h

This file was added.

				// This comment is stripped when file is opened, so size will change
				#define PRIV 0

clang/test/ClangScanDeps/Inputs/header_stat_before_open_cdb.json

This file was added.

				[
				{
				"directory": "DIR",
				"command": "clang -E DIR/header_stat_before_open.m -iframework Inputs/frameworks",
				"file": "DIR/header_stat_before_open.m"
				}
				]

clang/test/ClangScanDeps/Inputs/vfsoverlay.yaml

This file was added.

				{
				'version': 0,
				'roots': [
				{ 'name': 'DIR', 'type': 'directory',
				'contents': [
				{ 'name': 'not_real.h', 'type': 'file',
				'external-contents': 'DIR/Inputs/header.h'
				}
				]
				}
				]
				}

clang/test/ClangScanDeps/Inputs/vfsoverlay_cdb.json

This file was added.

				[
				{
				"directory": "DIR",
				"command": "clang -E DIR/vfsoverlay.cpp -IInputs -ivfsoverlay DIR/vfsoverlay.yaml",
				"file": "DIR/vfsoverlay.cpp"
				}
				]

clang/test/ClangScanDeps/header_stat_before_open.m

This file was added.

				// RUN: rm -rf %t.dir
				// RUN: rm -rf %t.cdb
				// RUN: mkdir -p %t.dir
				// RUN: cp %s %t.dir/header_stat_before_open.m
				// RUN: mkdir %t.dir/Inputs
				// RUN: cp -R %S/Inputs/frameworks %t.dir/Inputs/frameworks
				// RUN: sed -e "s\|DIR\|%/t.dir\|g" %S/Inputs/header_stat_before_open_cdb.json > %t.cdb
				//
				// RUN: clang-scan-deps -compilation-database %t.cdb -j 1 \| \
				// RUN: FileCheck %s

				#include "Framework/Framework.h"
				#include "Framework/PrivateHeader.h"

				// CHECK: clang-scan-deps dependency
				// CHECK-NEXT: header_stat_before_open.m
				// CHECK-NEXT: Inputs/frameworks/Framework.framework/Headers/Framework.h
				// CHECK-NEXT: Inputs/frameworks/Framework.framework/PrivateHeaders/PrivateHeader.h

clang/test/ClangScanDeps/regular_cdb.cpp

	// RUN: rm -rf %t.dir			// RUN: rm -rf %t.dir
	// RUN: rm -rf %t.cdb			// RUN: rm -rf %t.cdb
	// RUN: mkdir -p %t.dir			// RUN: mkdir -p %t.dir
	// RUN: cp %s %t.dir/regular_cdb.cpp			// RUN: cp %s %t.dir/regular_cdb.cpp
	// RUN: cp %s %t.dir/regular_cdb2.cpp			// RUN: cp %s %t.dir/regular_cdb2.cpp
	// RUN: mkdir %t.dir/Inputs			// RUN: mkdir %t.dir/Inputs
	// RUN: cp %S/Inputs/header.h %t.dir/Inputs/header.h			// RUN: cp %S/Inputs/header.h %t.dir/Inputs/header.h
	// RUN: cp %S/Inputs/header2.h %t.dir/Inputs/header2.h			// RUN: cp %S/Inputs/header2.h %t.dir/Inputs/header2.h
	// RUN: sed -e "s\|DIR\|%/t.dir\|g" %S/Inputs/regular_cdb.json > %t.cdb			// RUN: sed -e "s\|DIR\|%/t.dir\|g" %S/Inputs/regular_cdb.json > %t.cdb
	//			//
	// RUN: clang-scan-deps -compilation-database %t.cdb -j 1 \| \			// RUN: clang-scan-deps -compilation-database %t.cdb -j 1 -mode preprocess-minimized-sources \| \
				// RUN: FileCheck --check-prefixes=CHECK1,CHECK2,CHECK2NO %s
				// RUN: clang-scan-deps -compilation-database %t.cdb -j 1 -mode preprocess \| \
	// RUN: FileCheck --check-prefixes=CHECK1,CHECK2,CHECK2NO %s			// RUN: FileCheck --check-prefixes=CHECK1,CHECK2,CHECK2NO %s
	//			//
	// Make sure we didn't produce any dependency files!			// Make sure we didn't produce any dependency files!
	// RUN: not cat %t.dir/regular_cdb.d			// RUN: not cat %t.dir/regular_cdb.d
	// RUN: not cat %t.dir/regular_cdb2.d			// RUN: not cat %t.dir/regular_cdb2.d
	//			//
	// The output order is non-deterministic when using more than one thread,			// The output order is non-deterministic when using more than one thread,
	// so check the output using two runs. Note that the 'NOT' check is not used			// so check the output using two runs. Note that the 'NOT' check is not used
	// as it might fail if the results for `regular_cdb.cpp` are reported before			// as it might fail if the results for `regular_cdb.cpp` are reported before
	// `regular_cdb2.cpp`.			// `regular_cdb2.cpp`.
	//			//
	// RUN: clang-scan-deps -compilation-database %t.cdb -j 2 \| \			// RUN: clang-scan-deps -compilation-database %t.cdb -j 2 -mode preprocess-minimized-sources \| \
				// RUN: FileCheck --check-prefix=CHECK1 %s
				// RUN: clang-scan-deps -compilation-database %t.cdb -j 2 -mode preprocess \| \
	// RUN: FileCheck --check-prefix=CHECK1 %s			// RUN: FileCheck --check-prefix=CHECK1 %s
	// RUN: clang-scan-deps -compilation-database %t.cdb -j 2 \| \			// RUN: clang-scan-deps -compilation-database %t.cdb -j 2 -mode preprocess-minimized-sources \| \
				// RUN: FileCheck --check-prefix=CHECK2 %s
				// RUN: clang-scan-deps -compilation-database %t.cdb -j 2 -mode preprocess \| \
	// RUN: FileCheck --check-prefix=CHECK2 %s			// RUN: FileCheck --check-prefix=CHECK2 %s

	#include "header.h"			#include "header.h"

	// CHECK1: regular_cdb2.cpp			// CHECK1: regular_cdb2.cpp
	// CHECK1-NEXT: Inputs{{/\|\\}}header.h			// CHECK1-NEXT: Inputs{{/\|\\}}header.h
	// CHECK1-NEXT: Inputs{{/\|\\}}header2.h			// CHECK1-NEXT: Inputs{{/\|\\}}header2.h

	// CHECK2: regular_cdb.cpp			// CHECK2: regular_cdb.cpp
	// CHECK2-NEXT: Inputs{{/\|\\}}header.h			// CHECK2-NEXT: Inputs{{/\|\\}}header.h
	// CHECK2NO-NOT: header2			// CHECK2NO-NOT: header2

clang/test/ClangScanDeps/vfsoverlay.cpp

This file was added.

				// RUN: rm -rf %t.dir
				// RUN: rm -rf %t.cdb
				// RUN: mkdir -p %t.dir
				// RUN: cp %s %t.dir/vfsoverlay.cpp
				// RUN: sed -e "s\|DIR\|%/t.dir\|g" %S/Inputs/vfsoverlay.yaml > %t.dir/vfsoverlay.yaml
				// RUN: mkdir %t.dir/Inputs
				// RUN: cp %S/Inputs/header.h %t.dir/Inputs/header.h
				// RUN: sed -e "s\|DIR\|%/t.dir\|g" %S/Inputs/vfsoverlay_cdb.json > %t.cdb
				//
				// RUN: clang-scan-deps -compilation-database %t.cdb -j 1 \| \
				// RUN: FileCheck %s

				#include "not_real.h"

				// CHECK: clang-scan-deps dependency
				// CHECK-NEXT: vfsoverlay.cpp
				// CHECK-NEXT: Inputs{{/\|\\}}header.h

clang/tools/clang-scan-deps/ClangScanDeps.cpp

//===- ClangScanDeps.cpp - Implementation of clang-scan-deps --------------===//		//===- ClangScanDeps.cpp - Implementation of clang-scan-deps --------------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/Frontend/CompilerInstance.h"		#include "clang/Frontend/CompilerInstance.h"
#include "clang/Tooling/CommonOptionsParser.h"		#include "clang/Tooling/CommonOptionsParser.h"
		#include "clang/Tooling/DependencyScanning/DependencyScanningService.h"
#include "clang/Tooling/DependencyScanning/DependencyScanningWorker.h"		#include "clang/Tooling/DependencyScanning/DependencyScanningWorker.h"
#include "clang/Tooling/JSONCompilationDatabase.h"		#include "clang/Tooling/JSONCompilationDatabase.h"
#include "llvm/Support/InitLLVM.h"		#include "llvm/Support/InitLLVM.h"
#include "llvm/Support/Options.h"		#include "llvm/Support/Options.h"
#include "llvm/Support/Program.h"		#include "llvm/Support/Program.h"
#include "llvm/Support/Signals.h"		#include "llvm/Support/Signals.h"
#include "llvm/Support/Threading.h"		#include "llvm/Support/Threading.h"
#include <mutex>		#include <mutex>
Show All 21 Lines
/// The high-level implementation of the dependency discovery tool that runs on		/// The high-level implementation of the dependency discovery tool that runs on
/// an individual worker thread.		/// an individual worker thread.
class DependencyScanningTool {		class DependencyScanningTool {
public:		public:
/// Construct a dependency scanning tool.		/// Construct a dependency scanning tool.
///		///
/// \param Compilations The reference to the compilation database that's		/// \param Compilations The reference to the compilation database that's
/// used by the clang tool.		/// used by the clang tool.
DependencyScanningTool(const tooling::CompilationDatabase &Compilations,		DependencyScanningTool(DependencyScanningService &Service,
		const tooling::CompilationDatabase &Compilations,
SharedStream &OS, SharedStream &Errs)		SharedStream &OS, SharedStream &Errs)
: Compilations(Compilations), OS(OS), Errs(Errs) {}		: Worker(Service), Compilations(Compilations), OS(OS), Errs(Errs) {}

/// Computes the dependencies for the given file and prints them out.		/// Computes the dependencies for the given file and prints them out.
///		///
/// \returns True on error.		/// \returns True on error.
bool runOnFile(const std::string &Input, StringRef CWD) {		bool runOnFile(const std::string &Input, StringRef CWD) {
auto MaybeFile = Worker.getDependencyFile(Input, CWD, Compilations);		auto MaybeFile = Worker.getDependencyFile(Input, CWD, Compilations);
if (!MaybeFile) {		if (!MaybeFile) {
llvm::handleAllErrors(		llvm::handleAllErrors(
Show All 16 Lines	private:
SharedStream &Errs;		SharedStream &Errs;
};		};

llvm::cl::opt<bool> Help("h", llvm::cl::desc("Alias for -help"),		llvm::cl::opt<bool> Help("h", llvm::cl::desc("Alias for -help"),
llvm::cl::Hidden);		llvm::cl::Hidden);

llvm::cl::OptionCategory DependencyScannerCategory("Tool options");		llvm::cl::OptionCategory DependencyScannerCategory("Tool options");

		static llvm::cl::opt<ScanningMode> ScanMode(
		"mode",
		llvm::cl::desc("The preprocessing mode used to compute the dependencies"),
		llvm::cl::values(
		clEnumValN(ScanningMode::MinimizedSourcePreprocessing,
		"preprocess-minimized-sources",
		"The set of dependencies is computed by preprocessing the "
		"source files that were minimized to only include the "
		"contents that might affect the dependencies"),
		clEnumValN(ScanningMode::CanonicalPreprocessing, "preprocess",
		"The set of dependencies is computed by preprocessing the "
		"unmodified source files")),
		llvm::cl::init(ScanningMode::MinimizedSourcePreprocessing));

llvm::cl::opt<unsigned>		llvm::cl::opt<unsigned>
NumThreads("j", llvm::cl::Optional,		NumThreads("j", llvm::cl::Optional,
llvm::cl::desc("Number of worker threads to use (default: use "		llvm::cl::desc("Number of worker threads to use (default: use "
"all concurrent threads)"),		"all concurrent threads)"),
llvm::cl::init(0));		llvm::cl::init(0));

llvm::cl::opt<std::string>		llvm::cl::opt<std::string>
CompilationDB("compilation-database",		CompilationDB("compilation-database",
Show All 39 Lines	AdjustingCompilations->appendArgumentsAdjuster(
AdjustedArgs.push_back("-Xclang");		AdjustedArgs.push_back("-Xclang");
AdjustedArgs.push_back("-sys-header-deps");		AdjustedArgs.push_back("-sys-header-deps");
return AdjustedArgs;		return AdjustedArgs;
});		});

SharedStream Errs(llvm::errs());		SharedStream Errs(llvm::errs());
// Print out the dependency results to STDOUT by default.		// Print out the dependency results to STDOUT by default.
SharedStream DependencyOS(llvm::outs());		SharedStream DependencyOS(llvm::outs());

		DependencyScanningService Service(ScanMode);
unsigned NumWorkers =		unsigned NumWorkers =
NumThreads == 0 ? llvm::hardware_concurrency() : NumThreads;		NumThreads == 0 ? llvm::hardware_concurrency() : NumThreads;
std::vector<std::unique_ptr<DependencyScanningTool>> WorkerTools;		std::vector<std::unique_ptr<DependencyScanningTool>> WorkerTools;
for (unsigned I = 0; I < NumWorkers; ++I)		for (unsigned I = 0; I < NumWorkers; ++I)
WorkerTools.push_back(llvm::make_unique<DependencyScanningTool>(		WorkerTools.push_back(llvm::make_unique<DependencyScanningTool>(
*AdjustingCompilations, DependencyOS, Errs));		Service, *AdjustingCompilations, DependencyOS, Errs));

std::vector<std::thread> WorkerThreads;		std::vector<std::thread> WorkerThreads;
std::atomic<bool> HadErrors(false);		std::atomic<bool> HadErrors(false);
std::mutex Lock;		std::mutex Lock;
size_t Index = 0;		size_t Index = 0;

llvm::outs() << "Running clang-scan-deps on " << Inputs.size()		llvm::outs() << "Running clang-scan-deps on " << Inputs.size()
<< " files using " << NumWorkers << " workers\n";		<< " files using " << NumWorkers << " workers\n";
Show All 26 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clang-scan-deps] Implementation of dependency scanner over minimized sourcesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 206967

clang/include/clang/Basic/FileManager.h

clang/include/clang/Tooling/DependencyScanning/DependencyScanningFilesystem.h

clang/include/clang/Tooling/DependencyScanning/DependencyScanningService.h

clang/include/clang/Tooling/DependencyScanning/DependencyScanningWorker.h

clang/lib/Tooling/DependencyScanning/CMakeLists.txt

clang/lib/Tooling/DependencyScanning/DependencyScanningFilesystem.cpp

clang/lib/Tooling/DependencyScanning/DependencyScanningService.cpp

clang/lib/Tooling/DependencyScanning/DependencyScanningWorker.cpp

clang/test/ClangScanDeps/Inputs/frameworks/Framework.framework/Headers/Framework.h

clang/test/ClangScanDeps/Inputs/frameworks/Framework.framework/PrivateHeaders/PrivateHeader.h

clang/test/ClangScanDeps/Inputs/header_stat_before_open_cdb.json

clang/test/ClangScanDeps/Inputs/vfsoverlay.yaml

clang/test/ClangScanDeps/Inputs/vfsoverlay_cdb.json

clang/test/ClangScanDeps/header_stat_before_open.m

clang/test/ClangScanDeps/regular_cdb.cpp

clang/test/ClangScanDeps/vfsoverlay.cpp

clang/tools/clang-scan-deps/ClangScanDeps.cpp

[clang-scan-deps] Implementation of dependency scanner over minimized sources
ClosedPublic