This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/clangd/
-
clangd/
-
ClangdLSPServer.cpp
-
GlobalCompilationDatabase.h
-
GlobalCompilationDatabase.cpp
-
unittests/
-
GlobalCompilationDatabaseTests.cpp

Differential D148663

[RFC][clangd] Use interpolation for CDB pushed via LSP protocol
AbandonedPublic

Authored by DmitryPolukhin on Apr 18 2023, 2:35 PM.

Download Raw Diff

Details

Reviewers

kadircet
nridge
sammccall
ilya-biryukov

Summary

Now clangd only interpolates CDBs loaded from disk and doesn't make any
interpolation for CDBs pushed via LSP protocol. This diff add the same
extrapolation logic as for loaded from disk.

Test Plan: check-clangd

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

DmitryPolukhin created this revision.Apr 18 2023, 2:35 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 18 2023, 2:35 PM

Herald added a subscriber: arphaman. · View Herald Transcript

DmitryPolukhin requested review of this revision.Apr 18 2023, 2:35 PM

Herald added a subscriber: MaskRay. · View Herald TranscriptApr 18 2023, 2:35 PM

Harbormaster completed remote builds in B226460: Diff 514748.Apr 18 2023, 2:57 PM

tdupes added a subscriber: tdupes.Apr 24 2023, 10:51 AM

@kadircet @nridge friendly ping, could you please take a look?

I wanted to chime in and provide a bit of context.
This was a long time ago, so I might misremember, so take this with a grain of salt.

Idea behind pushing the CDB over LSP was to allow the capable client to fully control the commands produced for the files.
Decisions like interpolation were pushed towards the clients intentionally, not accidentally.
IIRC, the motivation back in the day was either sourcekit-lsp or Theia.

I will let other do the actual review, just wanted to bring up the history for a complete picutre.

In D148663#4298496, @ilya-biryukov wrote:

I wanted to chime in and provide a bit of context.
This was a long time ago, so I might misremember, so take this with a grain of salt.

Idea behind pushing the CDB over LSP was to allow the capable client to fully control the commands produced for the files.
Decisions like interpolation were pushed towards the clients intentionally, not accidentally.
IIRC, the motivation back in the day was either sourcekit-lsp or Theia.

I will let other do the actual review, just wanted to bring up the history for a complete picutre.

Thank you for sharing the context. I completely agree with the idea that nothing should prevent clangd clients from fully controlling CDB if they want it.
And, if they do so, this diff will just does nothing because there is an exact match. It starts playing only if client provided partial CDB and inference is required.
So I think it shouldn't break any existing scenarios. But IMHO having inference is a good feature for clangd because it allows pushing exactly the same CDB as in
compile_commands.json file and re-using clangd logic how to transfer compile commands from source to header. Pushing command via LSP might be preferable
in comparison with file based approach to avoid race condition with updating the file and clangd reading it + it works better with build systems that can generate
compiles commands on the fly for files and generating all of them in advance may not be possible.

In D148663#4301589, @DmitryPolukhin wrote:

And, if they do so, this diff will just does nothing because there is an exact match. It starts playing only if client provided partial CDB and inference is required.

(hypothesising below)
I think this depends on a client at question. If I were writing one and had an idea what I want to do for headers, e.g. I might have a project system that knows which build targets headers belong to,
I would rather see no compile errors for a header (if I have bugs in my integration) than for Clangd to kick in with interpolation and silently hide the bug.

From what I can recollect, your case is different and you just want to "pipe" compile_commands.json to Clangd provided by some build system, but without clangd actually reading it.
I don't actually write an LSP client, though, would let @kadircet decide which path is preferable.

Pushing command via LSP might be preferable in comparison with file based approach to avoid race condition with updating the file and clangd reading it

Ideally this should be handled with atomic writes (write to temp file and move to destination), at least on Linux and Mac. I'm not sure if build systems do that, though.
Does Clangd ever re-read the compilation database now? It used to load it only once, so rereading would require a restart of clangd anyway. Race on just one read is very unlikely (although not impossible).
However, the compile_commands.json file is parsed lazily, when the command for a particular file is actually requested, if you see crashes or inconsistencies in your scenarios, it highly likely due to this.

+ it works better with build systems that can generate compiles commands on the fly for files and generating all of them in advance may not be possible.

FYI, there is separate support for pushing command per-file in Clangd via an LSP extension.
I think this should cover build systems that generate commands on the fly better, but this also does not use interpolation.

A meta-comment: it would be really useful to understand how you use Clangd a bit better. Would it be hard to write down which kind of client you have and what are requirements for Clangd? Which build systems are at play, how does the client use them, etc?
To validate whether patches like this one are a good fit to solve the problem or there are better ways, it's really useful to have a higher level overview of how Clangd is being used.

In D148663#4305202, @ilya-biryukov wrote:

In D148663#4301589, @DmitryPolukhin wrote:

And, if they do so, this diff will just does nothing because there is an exact match. It starts playing only if client provided partial CDB and inference is required.

(hypothesising below)
I think this depends on a client at question. If I were writing one and had an idea what I want to do for headers, e.g. I might have a project system that knows which build targets headers belong to,
I would rather see no compile errors for a header (if I have bugs in my integration) than for Clangd to kick in with interpolation and silently hide the bug.

From what I can recollect, your case is different and you just want to "pipe" compile_commands.json to Clangd provided by some build system, but without clangd actually reading it.
I don't actually write an LSP client, though, would let @kadircet decide which path is preferable.

We have LSP client that works like a proxy and exposes LSP protocol to higher level. Now we do very deep processing of CDB and partially replicates logic clangd about command inference.
Clangd usually does good job in command inference and we would like to use this feature instead of keep our logic up-to-date. I can put this inference logic for LSP behind some command line flag
if you think that it might really break some good use case. Please let me know.

Pushing command via LSP might be preferable in comparison with file based approach to avoid race condition with updating the file and clangd reading it

Ideally this should be handled with atomic writes (write to temp file and move to destination), at least on Linux and Mac. I'm not sure if build systems do that, though.
Does Clangd ever re-read the compilation database now? It used to load it only once, so rereading would require a restart of clangd anyway. Race on just one read is very unlikely (although not impossible).
However, the compile_commands.json file is parsed lazily, when the command for a particular file is actually requested, if you see crashes or inconsistencies in your scenarios, it highly likely due to this.

Yes, clangd re-read CDB in some cases but it won't try to read CDB again from a directory if there was none before for performance reasons I think. I think synchronisation here is hard we cannot control when
clangd will actually read CDB so it might use wrong flags. Also we run multiple clangd behind multiplexing proxy and they might need different CDBs. Also CDB tends to become large and hard to manage so
per-file LSP protocol has lots of advantages for us.

+ it works better with build systems that can generate compiles commands on the fly for files and generating all of them in advance may not be possible.

FYI, there is separate support for pushing command per-file in Clangd via an LSP extension.
I think this should cover build systems that generate commands on the fly better, but this also does not use interpolation.

It is exactly the LSP protocol we are using and I added inference.

A meta-comment: it would be really useful to understand how you use Clangd a bit better. Would it be hard to write down which kind of client you have and what are requirements for Clangd? Which build systems are at play, how does the client use them, etc?
To validate whether patches like this one are a good fit to solve the problem or there are better ways, it's really useful to have a higher level overview of how Clangd is being used.

We use Buck and Buck2 as our main build system + some projects uses other build systems like CMake, and in general we don't limit build systems that subproject might use.
Therefore we cannot limit ourself to any particular clangd version and had to support multiple of them simultaneously because individual projects might might use incompatible features. So we have a LSP multiplexing proxy that
encapsulates build system specifics and combines results from several clangds. Changes in clangd that we would like to put in upstream in our understanding should be usable not only in our setup but should also improve clangd for all users.

I agree with Ilya's concerns here.

We deliberately don't mess with compile flags pushed over LSP. These are "overrides" to whatever information we have from other sources, turning on interpolation at this override layer implies we'll never fallback to other sources of information (as inference will always pick a target, it doesn't have a "that's too bad" threshold).
The contract on the LSP based compile flag setting requires clients to be "capable" of managing files somehow, having some mixed support is unlikely to benefit other users and more likely to break things as we're changing behaviour now (instead of fallback to underlying compilation database, we'll have interpolation).

In D148663#4318907, @kadircet wrote:

I agree with Ilya's concerns here.

We deliberately don't mess with compile flags pushed over LSP. These are "overrides" to whatever information we have from other sources, turning on interpolation at this override layer implies we'll never fallback to other sources of information (as inference will always pick a target, it doesn't have a "that's too bad" threshold).
The contract on the LSP based compile flag setting requires clients to be "capable" of managing files somehow, having some mixed support is unlikely to benefit other users and more likely to break things as we're changing behaviour now (instead of fallback to underlying compilation database, we'll have interpolation).

Thank you for the feedback, I see your point. Would you mind if I make this behaviour conditionally behind a command line flag?
I think most of the users never mix flags from different sources and, if they have CDB pushed from LSP, all flags except for fallback comes from this source and underlying compilation database is empty.

DmitryPolukhin abandoned this revision.May 27 2023, 1:34 PM

Revision Contents

Path

Size

clang-tools-extra/

clangd/

ClangdLSPServer.cpp

21 lines

GlobalCompilationDatabase.h

14 lines

GlobalCompilationDatabase.cpp

75 lines

unittests/

GlobalCompilationDatabaseTests.cpp

8 lines

Diff 514748

clang-tools-extra/clangd/ClangdLSPServer.cpp

	Show First 20 Lines • Show All 1,329 Lines • ▼ Show 20 Lines
	void ClangdLSPServer::onInlayHint(const InlayHintsParams &Params,			void ClangdLSPServer::onInlayHint(const InlayHintsParams &Params,
	Callback<std::vector<InlayHint>> Reply) {			Callback<std::vector<InlayHint>> Reply) {
	Server->inlayHints(Params.textDocument.uri.file(), Params.range,			Server->inlayHints(Params.textDocument.uri.file(), Params.range,
	std::move(Reply));			std::move(Reply));
	}			}

	void ClangdLSPServer::applyConfiguration(			void ClangdLSPServer::applyConfiguration(
	const ConfigurationSettings &Settings) {			const ConfigurationSettings &Settings) {
	// Per-file update to the compilation database.			llvm::StringMap<std::optional<tooling::CompileCommand>> Commands;
	llvm::StringSet<> ModifiedFiles;
	for (auto &Entry : Settings.compilationDatabaseChanges) {			for (auto &Entry : Settings.compilationDatabaseChanges) {
	PathRef File = Entry.first;			PathRef File = Entry.first;
	auto Old = CDB->getCompileCommand(File);			if (Entry.second.compilationCommand.empty())
	auto New =			Commands.insert({File, std::nullopt});
	tooling::CompileCommand(std::move(Entry.second.workingDirectory), File,			else
				Commands.insert({File, tooling::CompileCommand(
				std::move(Entry.second.workingDirectory), File,
	std::move(Entry.second.compilationCommand),			std::move(Entry.second.compilationCommand),
	/Output=/"");			/Output=/"")});
	if (Old != New) {
	CDB->setCompileCommand(File, std::move(New));
	ModifiedFiles.insert(File);
	}
	}			}
				llvm::StringSet<> ModifiedFiles{CDB->setCompileCommands(std::move(Commands))};
	Server->reparseOpenFilesIfNeeded(			Server->reparseOpenFilesIfNeeded(
	[&](llvm::StringRef File) { return ModifiedFiles.count(File) != 0; });			[&](llvm::StringRef File) { return ModifiedFiles.count(File) != 0; });
	}			}

	void ClangdLSPServer::maybeExportMemoryProfile() {			void ClangdLSPServer::maybeExportMemoryProfile() {
	if (!trace::enabled() \|\| !ShouldProfile())			if (!trace::enabled() \|\| !ShouldProfile())
	return;			return;

	▲ Show 20 Lines • Show All 475 Lines • Show Last 20 Lines

clang-tools-extra/clangd/GlobalCompilationDatabase.h

Show All 11 Lines
#include "support/Function.h"		#include "support/Function.h"
#include "support/Path.h"		#include "support/Path.h"
#include "support/Threading.h"		#include "support/Threading.h"
#include "support/ThreadsafeFS.h"		#include "support/ThreadsafeFS.h"
#include "clang/Tooling/ArgumentsAdjusters.h"		#include "clang/Tooling/ArgumentsAdjusters.h"
#include "clang/Tooling/CompilationDatabase.h"		#include "clang/Tooling/CompilationDatabase.h"
#include "llvm/ADT/FunctionExtras.h"		#include "llvm/ADT/FunctionExtras.h"
#include "llvm/ADT/StringMap.h"		#include "llvm/ADT/StringMap.h"
		#include "llvm/ADT/StringSet.h"
#include <memory>		#include <memory>
#include <mutex>		#include <mutex>
#include <optional>		#include <optional>
#include <vector>		#include <vector>

namespace clang {		namespace clang {
namespace clangd {		namespace clangd {

▲ Show 20 Lines • Show All 156 Lines • ▼ Show 20 Lines	public:
OverlayCDB(const GlobalCompilationDatabase *Base,		OverlayCDB(const GlobalCompilationDatabase *Base,
std::vector<std::string> FallbackFlags = {},		std::vector<std::string> FallbackFlags = {},
CommandMangler Mangler = nullptr);		CommandMangler Mangler = nullptr);

std::optional<tooling::CompileCommand>		std::optional<tooling::CompileCommand>
getCompileCommand(PathRef File) const override;		getCompileCommand(PathRef File) const override;
tooling::CompileCommand getFallbackCommand(PathRef File) const override;		tooling::CompileCommand getFallbackCommand(PathRef File) const override;

/// Sets or clears the compilation command for a particular file.		/// Sets compilation commands and return updated files.
void		llvm::StringSet<> setCompileCommands(
setCompileCommand(PathRef File,		llvm::StringMap<std::optional<tooling::CompileCommand>> Commands);
std::optional<tooling::CompileCommand> CompilationCommand);
		/// Legacy inefficient implementation that inserts one file at a time
		/// that is implemented as a wrapper on top of setCompileCommands above.
		void setCompileCommand(PathRef File,
		std::optional<tooling::CompileCommand> Cmd);

private:		private:
mutable std::mutex Mutex;		mutable std::mutex Mutex;
llvm::StringMap<tooling::CompileCommand> Commands; /* GUARDED_BY(Mut) */		llvm::StringMap<tooling::CompileCommand> Commands; /* GUARDED_BY(Mut) */
		std::unique_ptr<tooling::CompilationDatabase> CDB; /* GUARDED_BY(Mut) */
CommandMangler Mangler;		CommandMangler Mangler;
std::vector<std::string> FallbackFlags;		std::vector<std::string> FallbackFlags;
};		};

} // namespace clangd		} // namespace clangd
} // namespace clang		} // namespace clang

#endif // LLVM_CLANG_TOOLS_EXTRA_CLANGD_GLOBALCOMPILATIONDATABASE_H		#endif // LLVM_CLANG_TOOLS_EXTRA_CLANGD_GLOBALCOMPILATIONDATABASE_H

clang-tools-extra/clangd/GlobalCompilationDatabase.cpp

Show First 20 Lines • Show All 723 Lines • ▼ Show 20 Lines	DirectoryBasedGlobalCompilationDatabase::getProjectInfo(PathRef File) const {
Req.FreshTime = Req.FreshTimeMissing =		Req.FreshTime = Req.FreshTimeMissing =
std::chrono::steady_clock::time_point::min();		std::chrono::steady_clock::time_point::min();
auto Res = lookupCDB(Req);		auto Res = lookupCDB(Req);
if (!Res)		if (!Res)
return std::nullopt;		return std::nullopt;
return Res->PI;		return Res->PI;
}		}

		// Helper class that exposes CDB pushed via LSP protocol as
		// tooling::CompilationDatabase for interpolation.
		class InMemoryCompilationDatabase : public tooling::CompilationDatabase {
		public:
		InMemoryCompilationDatabase(
		llvm::StringMap<tooling::CompileCommand> &Commands)
		: Commands(Commands) {}

		std::vector<tooling::CompileCommand>
		getCompileCommands(StringRef FilePath) const override {
		auto It = Commands.find(removeDots(FilePath));
		if (It != Commands.end())
		return {It->second};
		return {};
		}

		std::vector<std::string> getAllFiles() const override {
		std::vector<std::string> Res;
		Res.reserve(Commands.size());
		for (const auto &S : Commands.keys())
		Res.push_back(S.str());
		return Res;
		}

		private:
		// Use reference to OverlayCDB::Commands to avoid copies.
		llvm::StringMap<tooling::CompileCommand> &Commands;
		};

OverlayCDB::OverlayCDB(const GlobalCompilationDatabase *Base,		OverlayCDB::OverlayCDB(const GlobalCompilationDatabase *Base,
std::vector<std::string> FallbackFlags,		std::vector<std::string> FallbackFlags,
CommandMangler Mangler)		CommandMangler Mangler)
: DelegatingCDB(Base), Mangler(std::move(Mangler)),		: DelegatingCDB(Base), Mangler(std::move(Mangler)),
FallbackFlags(std::move(FallbackFlags)) {}		FallbackFlags(std::move(FallbackFlags)) {}

std::optional<tooling::CompileCommand>		std::optional<tooling::CompileCommand>
OverlayCDB::getCompileCommand(PathRef File) const {		OverlayCDB::getCompileCommand(PathRef File) const {
std::optional<tooling::CompileCommand> Cmd;		std::optional<tooling::CompileCommand> Cmd;
{		{
std::lock_guard<std::mutex> Lock(Mutex);		std::lock_guard<std::mutex> Lock(Mutex);
auto It = Commands.find(removeDots(File));		if (CDB) {
if (It != Commands.end())		auto Candidates = CDB->getCompileCommands(File);
Cmd = It->second;		if (!Candidates.empty())
		Cmd = std::move(Candidates.front());
		}
}		}
if (!Cmd)		if (!Cmd)
Cmd = DelegatingCDB::getCompileCommand(File);		Cmd = DelegatingCDB::getCompileCommand(File);
if (!Cmd)		if (!Cmd)
return std::nullopt;		return std::nullopt;
if (Mangler)		if (Mangler)
Mangler(*Cmd, File);		Mangler(*Cmd, File);
return Cmd;		return Cmd;
}		}

tooling::CompileCommand OverlayCDB::getFallbackCommand(PathRef File) const {		tooling::CompileCommand OverlayCDB::getFallbackCommand(PathRef File) const {
auto Cmd = DelegatingCDB::getFallbackCommand(File);		auto Cmd = DelegatingCDB::getFallbackCommand(File);
std::lock_guard<std::mutex> Lock(Mutex);		std::lock_guard<std::mutex> Lock(Mutex);
Cmd.CommandLine.insert(Cmd.CommandLine.end(), FallbackFlags.begin(),		Cmd.CommandLine.insert(Cmd.CommandLine.end(), FallbackFlags.begin(),
FallbackFlags.end());		FallbackFlags.end());
if (Mangler)		if (Mangler)
Mangler(Cmd, File);		Mangler(Cmd, File);
return Cmd;		return Cmd;
}		}

void OverlayCDB::setCompileCommand(PathRef File,		llvm::StringSet<> OverlayCDB::setCompileCommands(
std::optional<tooling::CompileCommand> Cmd) {		llvm::StringMap<std::optional<tooling::CompileCommand>> NewCommands) {
// We store a canonical version internally to prevent mismatches between set		llvm::StringSet<> ModifiedFiles;
// and get compile commands. Also it assures clients listening to broadcasts
// doesn't receive different names for the same file.
std::string CanonPath = removeDots(File);
{		{
std::unique_lock<std::mutex> Lock(Mutex);		std::unique_lock<std::mutex> Lock(Mutex);
if (Cmd)		for (auto &E : NewCommands) {
Commands[CanonPath] = std::move(*Cmd);		// We store a canonical version internally to prevent mismatches between
		// set and get compile commands. Also it assures clients listening to
		// broadcasts doesn't receive different names for the same file.
		std::string CanonPath = removeDots(E.getKey());
		if (E.getValue())
		Commands[CanonPath] = std::move(*E.getValue());
else		else
Commands.erase(CanonPath);		Commands.erase(CanonPath);
		ModifiedFiles.insert(CanonPath);
}		}
OnCommandChanged.broadcast({CanonPath});		CDB = tooling::inferMissingCompileCommands(
		std::unique_ptr<tooling::CompilationDatabase>(
		new InMemoryCompilationDatabase(Commands)));
		}
		for (const auto &S : ModifiedFiles.keys())
		OnCommandChanged.broadcast({S.str()});
		return ModifiedFiles;
		}

		void OverlayCDB::setCompileCommand(PathRef File,
		std::optional<tooling::CompileCommand> Cmd) {
		llvm::StringMap<std::optional<tooling::CompileCommand>> NewCommands;
		NewCommands[File] = std::move(Cmd);
		setCompileCommands(NewCommands);
}		}

DelegatingCDB::DelegatingCDB(const GlobalCompilationDatabase *Base)		DelegatingCDB::DelegatingCDB(const GlobalCompilationDatabase *Base)
: Base(Base) {		: Base(Base) {
if (Base)		if (Base)
BaseChanged = Base->watch([this](const std::vector<std::string> Changes) {		BaseChanged = Base->watch([this](const std::vector<std::string> Changes) {
OnCommandChanged.broadcast(Changes);		OnCommandChanged.broadcast(Changes);
});		});
Show All 34 Lines

clang-tools-extra/clangd/unittests/GlobalCompilationDatabaseTests.cpp

Show First 20 Lines • Show All 89 Lines • ▼ Show 20 Lines	TEST_F(OverlayCDBTest, GetCompileCommand) {
EXPECT_THAT(CDB.getCompileCommand(testPath("foo.cc"))->CommandLine,		EXPECT_THAT(CDB.getCompileCommand(testPath("foo.cc"))->CommandLine,
AllOf(Contains(testPath("foo.cc")), Contains("-DA=1")));		AllOf(Contains(testPath("foo.cc")), Contains("-DA=1")));
EXPECT_EQ(CDB.getCompileCommand(testPath("missing.cc")), std::nullopt);		EXPECT_EQ(CDB.getCompileCommand(testPath("missing.cc")), std::nullopt);

auto Override = cmd(testPath("foo.cc"), "-DA=3");		auto Override = cmd(testPath("foo.cc"), "-DA=3");
CDB.setCompileCommand(testPath("foo.cc"), Override);		CDB.setCompileCommand(testPath("foo.cc"), Override);
EXPECT_THAT(CDB.getCompileCommand(testPath("foo.cc"))->CommandLine,		EXPECT_THAT(CDB.getCompileCommand(testPath("foo.cc"))->CommandLine,
Contains("-DA=3"));		Contains("-DA=3"));
EXPECT_EQ(CDB.getCompileCommand(testPath("missing.cc")), std::nullopt);		// Expect interpolation from foo.cc
CDB.setCompileCommand(testPath("missing.cc"), Override);
EXPECT_THAT(CDB.getCompileCommand(testPath("missing.cc"))->CommandLine,		EXPECT_THAT(CDB.getCompileCommand(testPath("missing.cc"))->CommandLine,
Contains("-DA=3"));		Contains("-DA=3"));
		// Check that explicit override replaces interpolation
		Override = cmd(testPath("missing.cc"), "-DA=4");
		CDB.setCompileCommand(testPath("missing.cc"), Override);
		EXPECT_THAT(CDB.getCompileCommand(testPath("missing.cc"))->CommandLine,
		Contains("-DA=4"));
}		}

TEST_F(OverlayCDBTest, GetFallbackCommand) {		TEST_F(OverlayCDBTest, GetFallbackCommand) {
OverlayCDB CDB(Base.get(), {"-DA=4"});		OverlayCDB CDB(Base.get(), {"-DA=4"});
EXPECT_THAT(CDB.getFallbackCommand(testPath("bar.cc")).CommandLine,		EXPECT_THAT(CDB.getFallbackCommand(testPath("bar.cc")).CommandLine,
ElementsAre("clang", "-DA=2", testPath("bar.cc"), "-DA=4"));		ElementsAre("clang", "-DA=2", testPath("bar.cc"), "-DA=4"));
}		}

▲ Show 20 Lines • Show All 422 Lines • Show Last 20 Lines