This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
2/2
Preamble.cpp

Differential D81719

[clangd] Drop usage of PreambleStatCache in scanPreamble
ClosedPublic

Authored by kadircet on Jun 12 2020, 1:30 AM.

Download Raw Diff

Details

Reviewers

sammccall

Commits

rGd2fcc586223c: [clangd] Drop usage of PreambleStatCache in scanPreamble

Summary

It was used inside buildCompilerInvocation to speed up stats. But
preambleStatCache doesn't contain stat information performed while
building compiler invocation. So it was an unnecessary optimization.

Furthermore, buildCompilerInvocation in scanPreamble doesn't need to
find gcc installation, include paths and such, as it is only trying to
lex directives. Hence we are passing an empty FS to get rid of any
redundant IO.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

kadircet created this revision.Jun 12 2020, 1:30 AM

Herald added a project: Restricted Project. · View Herald TranscriptJun 12 2020, 1:30 AM

Herald added subscribers: cfe-commits, usaxena95, arphaman and 3 others. · View Herald Transcript

kadircet added a child revision: D81720: [clangd] Change prepareCompilerInstance to take an FSProvider.Jun 12 2020, 2:14 AM

Harbormaster completed remote builds in B60078: Diff 270321.Jun 12 2020, 2:40 AM

As discussed offline, the cache is always missing today, but we do have reason to believe we're doing a fair amount of IO in buildCompilerInvocation and it should be very cacheable.
So dropping the cache here might be the wrong direction vs fixing it.
I don't know:

for sure how much IO is there or what it is
whether it's mostly stats and so could be handled by this mechanism
exactly what scope we should be filling the cache at (it's really tempting to do it on startup, which is a bigger change)

If the plan is to make the statcache wrap providers instead of/as well as FSes, do we actually need this change? (Then we can avoid having to decide right now)

clang-tools-extra/clangd/Preamble.cpp
215	This is confusing - this class is essentially unused (move to next patch?) and erasing the class with a lambda seems unnecessarily obscure.
248	(This change isn't related to the description, which threw me for a loop - may want to defer it)

Looked at this more in details, today we call buildCompilerInvocation in 3 different places:

Once in TUScheduler whenever we receive an update, later on it propagates into buildPreamble and buildAST.
Once during signatureHelp/codeCompletion. In theory that one can benefit from the stats we've performed during update above.
Two times during preamblePatching, as we create two different compiler instances. It seems like we don't care about those stats in here at all. As it is used to figure out resource/stdlib directories and other built-in include paths, and preamble patching doesn't care about those.
Once for read operations, if AST was eviced from cache, again this can benefit from the IO done while building invocation during last update request.

As for the amount of IO done during buildCompilerInvocation, I've run a simple test and here is a summary:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 80.71    0.002330           5       394       374 openat
 10.43    0.000301           4        71        39 stat
  1.49    0.000043           3        12           write
  1.42    0.000041           5         8           lstat
  1.28    0.000037           6         6           getdents64
  1.07    0.000031           1        21           fstat
  1.00    0.000029           1        20           close
  0.87    0.000025           4         6           readlink
  0.55    0.000016           2         6           pread64
  0.38    0.000011          11         1           getcwd
  0.31    0.000009           4         2           ioctl
  0.28    0.000008           8         1         1 lseek
  0.21    0.000006           0        50           mmap
  0.00    0.000000           0        10           read
  0.00    0.000000           0        35        34 access
  0.00    0.000000           0         1           execve
------ ----------- ----------- --------- --------- ----------------
100.00    0.002887                   644       448 total

Looks like only caching stats, especially without caching failures, is not going to a big win.
Most of the io is done during directory listing(openat), even though caching "success" in case of directory listing might be hard, we can actually cache only failures to mark missing files.

So ISTM:

buildCompilerInvocation usage inside scanPreamble doesn't need any access to any files, so I suggest we just pass empty FS.
we need a different cache for buildCompilerInvocation, one that caches dir_begin() failures.
IO done by buildCompilerInvocation depends on compiler args like sysroot, stdlib (and likely many more), hence I believe they should be cached inside TUScheduler and invalidated whenever compile commands change (maybe even more intrusive, a new one at each update since underlying FS can change too).

I am planning to just drop FS usage inside scanPreamble for good, and let the new caching mechanism emerge when we need it more :D WDYT?

Moreover, the most time consuming(well this might be subject to ltrace not counting IO wait times, as an openat above was only 5microsecs/call and a tolower call is 56 microsecs/call :D) bit actually seems like the string comparisons/manipulations done while parsing the command line args:

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 48.73    2.244680          56     39747 tolower
 21.88    1.008019          54     18522 strlen
  7.18    0.330929          54      6055 _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm
  5.45    0.251138          54      4605 _ZdlPv
  4.69    0.216144          54      3961 bcmp
  4.61    0.212397          54      3890 memcpy
  1.44    0.066164          54      1204 _Znwm

So caching the result of command line parsing might actually be better than caching IO. This also seems quite easy, as that part of the code is already isolated into a clang::driver::Compilation.
We can cache the result of command line parsing and invalidate it whenever compile commands changes(this is safe, as it is a pure function, (well maybe minus env variables, haven't checked in detail)). And later on use this for creating a compiler invocation.

Thanks for all this investigation!

80.71 0.002330 5 394 374 openat

I'm curious what the 400 attempts and 20 successes are (I've seen this before but don't remember now). Probably not worth digging into though unless you happen to have the strace logs.

buildCompilerInvocation usage inside scanPreamble doesn't need any access to any files, so I suggest we just pass empty FS

I guess this makes sense, My only worry is the driver getting into a different state if probing or cwd or something fails. But this really shouldn't affect preamble scanning. If it's safe, this seems worth doing just to have more isolation.

we need a different cache for buildCompilerInvocation, one that caches dir_begin() failures

Yeah this is complicated - worthwhile if the IO is actually adding ~20ms. Easiest way to tell if tracing tools aren't helping might be to use an empty FS and ignore all the resulting problems - timing for buildCompilerInvocation should be correct.
If needed, maybe the record/replay FSes used for lldb reproducers are usable? Nice to avoid that complexity if possible though.

48.73 2.244680 56 39747 tolower

How many per call to buildCompilerInvocation? Maybe arg parsing is doing something dumb...

Drop VFS usage in scanPreamble completely to avoid IO.

In D81719#2092589, @sammccall wrote:

Thanks for all this investigation!

80.71 0.002330 5 394 374 openat

I'm curious what the 400 attempts and 20 successes are (I've seen this before but don't remember now). Probably not worth digging into though unless you happen to have the strace logs.

This is mostly gcc installation scanning for libc and such, biggest call site in https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/ToolChains/Gnu.cpp#L2408, which is called multiple times from https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/ToolChains/Gnu.cpp#L1907.

buildCompilerInvocation usage inside scanPreamble doesn't need any access to any files, so I suggest we just pass empty FS

I guess this makes sense, My only worry is the driver getting into a different state if probing or cwd or something fails. But this really shouldn't affect preamble scanning. If it's safe, this seems worth doing just to have more isolation.

Changing this patch to do that instead.

we need a different cache for buildCompilerInvocation, one that caches dir_begin() failures

Yeah this is complicated - worthwhile if the IO is actually adding ~20ms. Easiest way to tell if tracing tools aren't helping might be to use an empty FS and ignore all the resulting problems - timing for buildCompilerInvocation should be correct.
If needed, maybe the record/replay FSes used for lldb reproducers are usable? Nice to avoid that complexity if possible though.

Benchmarked with an empty inmemoryfs and real filesystem using fallback commands(clang a.cc). there seems to be about a 6 times speed up when buildCompilerInvocation is run without IO.
empty fs takes about 0.17 ms on average, whereas real file system takes 1.01 ms on average.

Changed the compile command to something google3 sized (~400 args):
empty fs takes about 1.4ms, whereas the real IO takes about 1.8ms.

So shaving off some IO might help a lot for trivial command lines, but for complicated commands we need to improve command line parsing or start caching the result.

48.73 2.244680 56 39747 tolower

How many per call to buildCompilerInvocation? Maybe arg parsing is doing something dumb...

this is only a single call to buildCompilerInvocation :(

Just for fun, top 5 library calls in a complicated command line case (~400 args):

% time     seconds  usecs/call     calls      function
------ ----------- ----------- --------- --------------------
 50.61    3.815022          54     69820 tolower
 25.35    1.911093          55     34416 strlen
  9.35    0.704789          53     13067 bcmp
  5.09    0.383536          56      6773 _ZNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEE9_M_appendEPKcm
  3.69    0.278408          56      4935 _ZdlPv

so calls to tolower/strlen seems to be scaling sub-linearly (previous command line had only 2 args, so there's about 200x increase whereas call counts seem to have only doubled).

kadircet edited the summary of this revision. (Show Details)Jun 16 2020, 7:09 AM

Harbormaster completed remote builds in B60478: Diff 271083.Jun 16 2020, 8:16 AM

sammccall accepted this revision.Jun 17 2020, 8:09 AM

This revision is now accepted and ready to land.Jun 17 2020, 8:09 AM

Rebase

Closed by commit rGd2fcc586223c: [clangd] Drop usage of PreambleStatCache in scanPreamble (authored by kadircet). · Explain WhyJun 19 2020, 3:45 AM

This revision was automatically updated to reflect the committed changes.

Harbormaster failed remote builds in B60988: Diff 271993!Jun 19 2020, 3:45 AM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

Preamble.cpp

37 lines

Diff 270321

clang-tools-extra/clangd/Preamble.cpp

	Show First 20 Lines • Show All 201 Lines • ▼ Show 20 Lines

	struct ScannedPreamble {			struct ScannedPreamble {
	std::vector<Inclusion> Includes;			std::vector<Inclusion> Includes;
	std::vector<TextualPPDirective> TextualDirectives;			std::vector<TextualPPDirective> TextualDirectives;
	};			};

	/// Scans the preprocessor directives in the preamble section of the file by			/// Scans the preprocessor directives in the preamble section of the file by
	/// running preprocessor over \p Contents. Returned includes do not contain			/// running preprocessor over \p Contents. Returned includes do not contain
	/// resolved paths. \p VFS and \p Cmd is used to build the compiler invocation,			/// resolved paths. \p Cmd is used to build the compiler invocation, which might
	/// which might stat/read files.			/// stat/read files.
	llvm::Expected<ScannedPreamble>			llvm::Expected<ScannedPreamble>
	scanPreamble(llvm::StringRef Contents,			scanPreamble(llvm::StringRef Contents, const FileSystemProvider *FSProvider,
	llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS,
	const tooling::CompileCommand &Cmd) {			const tooling::CompileCommand &Cmd) {
	// FIXME: Change PreambleStatCache to operate on FileSystemProvider rather			auto EmptyFSProvider = [] {
				sammccallUnsubmitted Done Reply Inline Actions This is confusing - this class is essentially unused (move to next patch?) and erasing the class with a lambda seems unnecessarily obscure. sammccall: This is confusing - this class is essentially unused (move to next patch?) and erasing the…
	// than vfs::FileSystem, that way we can just use ParseInputs without this			class EmptyProvider : public FileSystemProvider {
	// hack.
	auto GetFSProvider = [](llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS) {
	class VFSProvider : public FileSystemProvider {
	public:			public:
	VFSProvider(llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> FS)
	: VFS(std::move(FS)) {}
	llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>			llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem>
	getFileSystem() const override {			getFileSystem() const override {
	return VFS;			return new llvm::vfs::InMemoryFileSystem;
	}			}

	private:
	const llvm::IntrusiveRefCntPtr<llvm::vfs::FileSystem> VFS;
	};			};
	return std::make_unique<VFSProvider>(std::move(FS));			return std::make_unique<EmptyProvider>();
	};			};
	auto FSProvider = GetFSProvider(std::move(VFS));
	// Build and run Preprocessor over the preamble.			// Build and run Preprocessor over the preamble.
	ParseInputs PI;			ParseInputs PI;
	PI.Contents = Contents.str();			PI.Contents = Contents.str();
	PI.FSProvider = FSProvider.get();			PI.FSProvider = FSProvider;
	PI.CompileCommand = Cmd;			PI.CompileCommand = Cmd;
	IgnoringDiagConsumer IgnoreDiags;			IgnoringDiagConsumer IgnoreDiags;
	auto CI = buildCompilerInvocation(PI, IgnoreDiags);			auto CI = buildCompilerInvocation(PI, IgnoreDiags);
	if (!CI)			if (!CI)
	return llvm::createStringError(llvm::inconvertibleErrorCode(),			return llvm::createStringError(llvm::inconvertibleErrorCode(),
	"failed to create compiler invocation");			"failed to create compiler invocation");
	CI->getDiagnosticOpts().IgnoreWarnings = true;			CI->getDiagnosticOpts().IgnoreWarnings = true;
	auto ContentsBuffer = llvm::MemoryBuffer::getMemBuffer(Contents);			auto ContentsBuffer = llvm::MemoryBuffer::getMemBuffer(Contents);
	// This means we're scanning (though not preprocessing) the preamble section			// This means we're scanning (though not preprocessing) the preamble section
	// twice. However, it's important to precisely follow the preamble bounds used			// twice. However, it's important to precisely follow the preamble bounds used
	// elsewhere.			// elsewhere.
	auto Bounds =			auto Bounds =
	ComputePreambleBounds(*CI->getLangOpts(), ContentsBuffer.get(), 0);			ComputePreambleBounds(*CI->getLangOpts(), ContentsBuffer.get(), 0);
	auto PreambleContents =			auto PreambleContents =
	llvm::MemoryBuffer::getMemBufferCopy(Contents.substr(0, Bounds.Size));			llvm::MemoryBuffer::getMemBufferCopy(Contents.substr(0, Bounds.Size));
	auto Clang = prepareCompilerInstance(			auto Clang = prepareCompilerInstance(
	std::move(CI), nullptr, std::move(PreambleContents),			std::move(CI), nullptr, std::move(PreambleContents),
	// Provide an empty FS to prevent preprocessor from performing IO. This			// Provide an empty FS to prevent preprocessor from performing IO. This
	// also implies missing resolved paths for includes.			// also implies missing resolved paths for includes.
	new llvm::vfs::InMemoryFileSystem, IgnoreDiags);			EmptyFSProvider()->getFileSystem(), IgnoreDiags);
				sammccallUnsubmitted Done Reply Inline Actions (This change isn't related to the description, which threw me for a loop - may want to defer it) sammccall: (This change isn't related to the description, which threw me for a loop - may want to defer it)
	if (Clang->getFrontendOpts().Inputs.empty())			if (Clang->getFrontendOpts().Inputs.empty())
	return llvm::createStringError(llvm::inconvertibleErrorCode(),			return llvm::createStringError(llvm::inconvertibleErrorCode(),
	"compiler instance had no inputs");			"compiler instance had no inputs");
	// We are only interested in main file includes.			// We are only interested in main file includes.
	Clang->getPreprocessorOpts().SingleFileParseMode = true;			Clang->getPreprocessorOpts().SingleFileParseMode = true;
	PreprocessOnlyAction Action;			PreprocessOnlyAction Action;
	if (!Action.BeginSourceFile(*Clang, Clang->getFrontendOpts().Inputs[0]))			if (!Action.BeginSourceFile(*Clang, Clang->getFrontendOpts().Inputs[0]))
	return llvm::createStringError(llvm::inconvertibleErrorCode(),			return llvm::createStringError(llvm::inconvertibleErrorCode(),
	▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines
	}			}

	PreamblePatch PreamblePatch::create(llvm::StringRef FileName,			PreamblePatch PreamblePatch::create(llvm::StringRef FileName,
	const ParseInputs &Modified,			const ParseInputs &Modified,
	const PreambleData &Baseline) {			const PreambleData &Baseline) {
	trace::Span Tracer("CreatePreamblePatch");			trace::Span Tracer("CreatePreamblePatch");
	SPAN_ATTACH(Tracer, "File", FileName);			SPAN_ATTACH(Tracer, "File", FileName);
	assert(llvm::sys::path::is_absolute(FileName) && "relative FileName!");			assert(llvm::sys::path::is_absolute(FileName) && "relative FileName!");
	auto VFS =
	Baseline.StatCache->getConsumingFS(Modified.FSProvider->getFileSystem());
	// First scan preprocessor directives in Baseline and Modified. These will be			// First scan preprocessor directives in Baseline and Modified. These will be
	// used to figure out newly added directives in Modified. Scanning can fail,			// used to figure out newly added directives in Modified. Scanning can fail,
	// the code just bails out and creates an empty patch in such cases, as:			// the code just bails out and creates an empty patch in such cases, as:
	// - If scanning for Baseline fails, no knowledge of existing includes hence			// - If scanning for Baseline fails, no knowledge of existing includes hence
	// patch will contain all the includes in Modified. Leading to rebuild of			// patch will contain all the includes in Modified. Leading to rebuild of
	// whole preamble, which is terribly slow.			// whole preamble, which is terribly slow.
	// - If scanning for Modified fails, cannot figure out newly added ones so			// - If scanning for Modified fails, cannot figure out newly added ones so
	// there's nothing to do but generate an empty patch.			// there's nothing to do but generate an empty patch.
	auto BaselineScan = scanPreamble(			auto BaselineScan = scanPreamble(
	// Contents needs to be null-terminated.			// Contents needs to be null-terminated.
	Baseline.Preamble.getContents().str(), VFS, Modified.CompileCommand);			Baseline.Preamble.getContents().str(), Modified.FSProvider,
				Modified.CompileCommand);
	if (!BaselineScan) {			if (!BaselineScan) {
	elog("Failed to scan baseline of {0}: {1}", FileName,			elog("Failed to scan baseline of {0}: {1}", FileName,
	BaselineScan.takeError());			BaselineScan.takeError());
	return PreamblePatch::unmodified(Baseline);			return PreamblePatch::unmodified(Baseline);
	}			}
	auto ModifiedScan =			auto ModifiedScan = scanPreamble(Modified.Contents, Modified.FSProvider,
	scanPreamble(Modified.Contents, std::move(VFS), Modified.CompileCommand);			Modified.CompileCommand);
	if (!ModifiedScan) {			if (!ModifiedScan) {
	elog("Failed to scan modified contents of {0}: {1}", FileName,			elog("Failed to scan modified contents of {0}: {1}", FileName,
	ModifiedScan.takeError());			ModifiedScan.takeError());
	return PreamblePatch::unmodified(Baseline);			return PreamblePatch::unmodified(Baseline);
	}			}

	bool IncludesChanged = BaselineScan->Includes != ModifiedScan->Includes;			bool IncludesChanged = BaselineScan->Includes != ModifiedScan->Includes;
	bool DirectivesChanged =			bool DirectivesChanged =
	▲ Show 20 Lines • Show All 119 Lines • Show Last 20 Lines