Download Raw Diff

Details

Reviewers

Commits

rGe397a0a5c3c0: [clangd] Add instrumentation mode in clangd for metrics collection.

Summary

This patch adds an instrumentation mode for clangd (enabled by
corresponding option in cc_opts).
If this mode is enabled then user can specify callbacks to run on the
final code completion result.

Moreover the CodeCompletion::Score will contain the detailed Quality and
Relevance signals used to compute the score when this mode is enabled.
These are required because we do not any place in which the final
candidates (scored and sorted) are available along with the above
signals. The signals are temporary structures in addCandidate.

The callback is needed as it gives access to many data structures that
are internal to CodeCompleteFlow and are available once Sema has run. Eg:
ScopeDistnace and FileDistance.

If this mode is disabled (as in default) then Score would just contain 2
shared pointers (null). Thus cost(memory/time) increase for the default
mode would be fairly cheap and insignificant.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

usaxena95 created this revision.Mar 4 2020, 5:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2020, 5:11 AM

Herald added subscribers: cfe-commits, kadircet, arphaman and 3 others. · View Herald Transcript

Addressed linter issues.

Harbormaster failed remote builds in B48034: Diff 248151!Mar 4 2020, 6:11 AM

sammccall added inline comments.Mar 4 2020, 6:21 AM

clang-tools-extra/clangd/CodeComplete.h
140	I'm wondering if we can simplify this interface a bit. I'm not sure why we need another callback rather than just returning the CodeCompleteResult in the usual way. Another option: if we invoke a callback for each completion instead of for the result set as a whole, we don't need to work out where to stash anything. the work done after `addCandidate` is pretty trivial, so invoking a callback there provides basically all the information about the result set. The Top-N truncation is probably something you'd rather not have for analysis. code completion always ends with the callback being invoked, so cross-result analysis can be done at that point. So I think this could just be a single `std::function<void(const CodeCompletion&, const SymbolQualitySignals &, const SymbolRelevanceSignals &)>`. If it's non-null, addCandidate would call toCodeCompletion on the bundle and pass it to the callback at the end.
220	why shared rather than unique?

kadircet added inline comments.Mar 4 2020, 6:33 AM

clang-tools-extra/clangd/CodeComplete.cpp
1463	can't we make use of the trace::Span instead ?

Harbormaster failed remote builds in B48040: Diff 248160!Mar 4 2020, 7:19 AM

Changed to invoke callback on all code completion items.

usaxena95 marked 3 inline comments as done.Mar 4 2020, 10:26 AM

usaxena95 added inline comments.

clang-tools-extra/clangd/CodeComplete.cpp
1463	CMIIW. I believe with `trace::Span` we can send only JSON messages from clangd to the tool running it. This doesn't allow us to get access to internal DS of CodeCompleteFlow that are used along with Quality and Relevance signals (like Distances of file and scopes). You can argue that all this information can be serialized as a JSON (including features derived from these internal DS) but this then must be done on clangd side (not on tools side). IMO this gives more freedom to the tool to use and derive more features which makes experimentation easier.
clang-tools-extra/clangd/CodeComplete.h
140	why we need another callback rather than just returning the CodeCompleteResult in the usual way. There are some data structures from CodeCompleteFlow referred to in Reference signals like `ScopeDistance` which were needed to compute distances. But think can be addressed in your suggested "per completion" callback approach. Another option: ... I had given this approach some thought previously and had concerns about mapping the items to the final ranked items in the TopN result. But on a second thought we can completely ignore the final result (assuming no significant changes are done after `addCandidate`) and score and rank the results ourselves in the FlumeTool.
220	A not-so-proud hack to keep `Score` copyable (this is removed now).

Remove ununsed import.

sammccall accepted this revision.Mar 4 2020, 10:58 AM

sammccall added inline comments.

clang-tools-extra/clangd/CodeComplete.h
136	First say what the behavior/API is (called once for each result...), Then justify it :)
139	I'd suggest including the final score in the signature rather than recompute it, just so the contract is really clear and simple (results yielded in arbitrary order, will be ranked by -score). Please spell this out.
140	This doesn't need to be a pointer, std::function is copy/movable and supports nullptr as a sentinel value.
clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp
1048	Nit: typically `auto` here (the anonymous lambda type) and let it convert to function implicitly when needed. No need for `-> type` in trivial cases

This revision is now accepted and ready to land.Mar 4 2020, 10:58 AM

Harbormaster completed remote builds in B48076: Diff 248240.Mar 4 2020, 12:21 PM

Harbormaster completed remote builds in B48073: Diff 248233.

Forgot to mention: I also think the trace approach certainly has things going for it, or even parsing out the messages from the existing logs.
But in this particular case the callback happens to be extremely convenient and also not invasive (since the data structures are already exposed, code complete has an opts struct etc). And since this is for analysis we have a lot of flexibility to rework if it stops being easy to maintain.

In D75603#1906418, @sammccall wrote:

Forgot to mention: I also think the trace approach certainly has things going for it, or even parsing out the messages from the existing logs.
But in this particular case the callback happens to be extremely convenient and also not invasive (since the data structures are already exposed, code complete has an opts struct etc). And since this is for analysis we have a lot of flexibility to rework if it stops being easy to maintain.

as discussed offline, I was rather afraid of the initial version of the patch, but the final version seems ok as it only adds a single field to codecompleteopts.

Addressed comments.

Populated score in CodeCompletion before invoking the callback.
Tested that CodeCompletion is scored
Updated comment for callback.

usaxena95 added inline comments.Mar 5 2020, 3:07 AM

clang-tools-extra/clangd/CodeComplete.h
139	Couldn't add it to the signature since inner classes cannot be forward declared. Since `CodeCompletion` contains the `Score`, I have populated this field in the `CodeCompletion` (and also `CompletionTokenRange`) as done in `toCodeCompleteResult` to be consistent. Also tested that the CodeCompletion is scored.

Passed score as a float as an explicit argument of the callback.

Closed by commit rGe397a0a5c3c0: [clangd] Add instrumentation mode in clangd for metrics collection. (authored by usaxena95). · Explain WhyMar 5 2020, 3:51 AM

This revision was automatically updated to reflect the committed changes.

Harbormaster completed remote builds in B48165: Diff 248416.Mar 5 2020, 4:56 AM

Harbormaster completed remote builds in B48167: Diff 248422.Mar 5 2020, 5:29 AM

Diff 248429

clang-tools-extra/clangd/CodeComplete.h

Show All 13 Lines

#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_CODECOMPLETE_H		#ifndef LLVM_CLANG_TOOLS_EXTRA_CLANGD_CODECOMPLETE_H
#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_CODECOMPLETE_H		#define LLVM_CLANG_TOOLS_EXTRA_CLANGD_CODECOMPLETE_H

#include "Headers.h"		#include "Headers.h"
#include "Logger.h"		#include "Logger.h"
#include "Path.h"		#include "Path.h"
#include "Protocol.h"		#include "Protocol.h"
		#include "Quality.h"
#include "index/Index.h"		#include "index/Index.h"
#include "index/Symbol.h"		#include "index/Symbol.h"
#include "index/SymbolOrigin.h"		#include "index/SymbolOrigin.h"
#include "clang/Sema/CodeCompleteConsumer.h"		#include "clang/Sema/CodeCompleteConsumer.h"
#include "clang/Sema/CodeCompleteOptions.h"		#include "clang/Sema/CodeCompleteOptions.h"
#include "clang/Tooling/CompilationDatabase.h"		#include "clang/Tooling/CompilationDatabase.h"
#include "llvm/ADT/Optional.h"		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/SmallVector.h"		#include "llvm/ADT/SmallVector.h"
#include "llvm/ADT/StringRef.h"		#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Error.h"		#include "llvm/Support/Error.h"
		#include <functional>
#include <future>		#include <future>

namespace clang {		namespace clang {
class NamedDecl;		class NamedDecl;
namespace clangd {		namespace clangd {
struct PreambleData;		struct PreambleData;
		struct CodeCompletion;

struct CodeCompleteOptions {		struct CodeCompleteOptions {
/// Returns options that can be passed to clang's completion engine.		/// Returns options that can be passed to clang's completion engine.
clang::CodeCompleteOptions getClangCompleteOpts() const;		clang::CodeCompleteOptions getClangCompleteOpts() const;

/// When true, completion items will contain expandable code snippets in		/// When true, completion items will contain expandable code snippets in
/// completion (e.g. `return ${1:expression}` or `foo(${1:int a}, ${2:int		/// completion (e.g. `return ${1:expression}` or `foo(${1:int a}, ${2:int
/// b})).		/// b})).
▲ Show 20 Lines • Show All 78 Lines • ▼ Show 20 Lines	enum CodeCompletionParse {
/// Return an error if this fails.		/// Return an error if this fails.
AlwaysParse,		AlwaysParse,
/// Run the parser if inputs (preamble) are ready.		/// Run the parser if inputs (preamble) are ready.
/// Otherwise, use text-based completion.		/// Otherwise, use text-based completion.
ParseIfReady,		ParseIfReady,
/// Always use text-based completion.		/// Always use text-based completion.
NeverParse,		NeverParse,
} RunParser = ParseIfReady;		} RunParser = ParseIfReady;

		/// Callback invoked on all CompletionCandidate after they are scored and
		sammccallUnsubmitted Done Reply Inline Actions First say what the behavior/API is (called once for each result...), Then justify it :) sammccall: First say what the behavior/API is (called once for each result...), Then justify it :)
		/// before they are ranked (by -Score). Thus the results are yielded in
		/// arbitrary order.
		///
		sammccallUnsubmitted Done Reply Inline Actions I'd suggest including the final score in the signature rather than recompute it, just so the contract is really clear and simple (results yielded in arbitrary order, will be ranked by -score). Please spell this out. sammccall: I'd suggest including the final score in the signature rather than recompute it, just so the…
		usaxena95AuthorUnsubmitted Done Reply Inline Actions Couldn't add it to the signature since inner classes cannot be forward declared. Since `CodeCompletion` contains the `Score`, I have populated this field in the `CodeCompletion` (and also `CompletionTokenRange`) as done in `toCodeCompleteResult` to be consistent. Also tested that the CodeCompletion is scored. usaxena95: Couldn't add it to the signature since inner classes cannot be forward declared. Since…
		/// This callbacks allows capturing various internal structures used by clangd
		sammccallUnsubmitted Done Reply Inline Actions I'm wondering if we can simplify this interface a bit. I'm not sure why we need another callback rather than just returning the CodeCompleteResult in the usual way. Another option: if we invoke a callback for each completion instead of for the result set as a whole, we don't need to work out where to stash anything. the work done after `addCandidate` is pretty trivial, so invoking a callback there provides basically all the information about the result set. The Top-N truncation is probably something you'd rather not have for analysis. code completion always ends with the callback being invoked, so cross-result analysis can be done at that point. So I think this could just be a single `std::function<void(const CodeCompletion&, const SymbolQualitySignals &, const SymbolRelevanceSignals &)>`. If it's non-null, addCandidate would call toCodeCompletion on the bundle and pass it to the callback at the end. sammccall: I'm wondering if we can simplify this interface a bit. I'm not sure why we need another…
		usaxena95AuthorUnsubmitted Done Reply Inline Actions why we need another callback rather than just returning the CodeCompleteResult in the usual way. There are some data structures from CodeCompleteFlow referred to in Reference signals like `ScopeDistance` which were needed to compute distances. But think can be addressed in your suggested "per completion" callback approach. Another option: ... I had given this approach some thought previously and had concerns about mapping the items to the final ranked items in the TopN result. But on a second thought we can completely ignore the final result (assuming no significant changes are done after `addCandidate`) and score and rank the results ourselves in the FlumeTool. usaxena95: > why we need another callback rather than just returning the CodeCompleteResult in the usual…
		sammccallUnsubmitted Done Reply Inline Actions This doesn't need to be a pointer, std::function is copy/movable and supports nullptr as a sentinel value. sammccall: This doesn't need to be a pointer, std::function is copy/movable and supports nullptr as a…
		/// during code completion. Eg: Symbol quality and relevance signals.
		std::function<void(const CodeCompletion &, const SymbolQualitySignals &,
		const SymbolRelevanceSignals &, float Score)>
		RecordCCResult;
};		};

// Semi-structured representation of a code-complete suggestion for our C++ API.		// Semi-structured representation of a code-complete suggestion for our C++ API.
// We don't use the LSP structures here (unlike most features) as we want		// We don't use the LSP structures here (unlike most features) as we want
// to expose more data to allow for more precise testing and evaluation.		// to expose more data to allow for more precise testing and evaluation.
struct CodeCompletion {		struct CodeCompletion {
// The unqualified name of the symbol or other completion item.		// The unqualified name of the symbol or other completion item.
std::string Name;		std::string Name;
▲ Show 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	struct Scores {
// independent of the query.		// independent of the query.
// e.g. symbols with lots of incoming references have higher quality.		// e.g. symbols with lots of incoming references have higher quality.
float Quality = 0.f;		float Quality = 0.f;
// Relevance describes how well this candidate matched the query.		// Relevance describes how well this candidate matched the query.
// e.g. symbols from nearby files have higher relevance.		// e.g. symbols from nearby files have higher relevance.
float Relevance = 0.f;		float Relevance = 0.f;
};		};
Scores Score;		Scores Score;

		sammccallUnsubmitted Done Reply Inline Actions why shared rather than unique? sammccall: why shared rather than unique?
		usaxena95AuthorUnsubmitted Done Reply Inline Actions A not-so-proud hack to keep `Score` copyable (this is removed now). usaxena95: A not-so-proud hack to keep `Score` copyable (this is removed now).
/// Indicates if this item is deprecated.		/// Indicates if this item is deprecated.
bool Deprecated = false;		bool Deprecated = false;

// Serialize this to an LSP completion item. This is a lossy operation.		// Serialize this to an LSP completion item. This is a lossy operation.
CompletionItem render(const CodeCompleteOptions &) const;		CompletionItem render(const CodeCompleteOptions &) const;
};		};
raw_ostream &operator<<(raw_ostream &, const CodeCompletion &);		raw_ostream &operator<<(raw_ostream &, const CodeCompletion &);
struct CodeCompleteResult {		struct CodeCompleteResult {
▲ Show 20 Lines • Show All 87 Lines • Show Last 20 Lines

clang-tools-extra/clangd/CodeComplete.cpp

Show First 20 Lines • Show All 1,454 Lines • ▼ Show 20 Lines	auto IndexResults = (Opts.Index && allowIndex(Recorder->CCContext))
? queryIndex()		? queryIndex()
: SymbolSlab();		: SymbolSlab();
trace::Span Tracer("Populate CodeCompleteResult");		trace::Span Tracer("Populate CodeCompleteResult");
// Merge Sema and Index results, score them, and pick the winners.		// Merge Sema and Index results, score them, and pick the winners.
auto Top =		auto Top =
mergeResults(Recorder->Results, IndexResults, /Identifiers/ {});		mergeResults(Recorder->Results, IndexResults, /Identifiers/ {});
return toCodeCompleteResult(Top);		return toCodeCompleteResult(Top);
}		}

		kadircetUnsubmitted Done Reply Inline Actions can't we make use of the trace::Span instead ? kadircet: can't we make use of the trace::Span instead ?
		usaxena95AuthorUnsubmitted Done Reply Inline Actions CMIIW. I believe with `trace::Span` we can send only JSON messages from clangd to the tool running it. This doesn't allow us to get access to internal DS of CodeCompleteFlow that are used along with Quality and Relevance signals (like Distances of file and scopes). You can argue that all this information can be serialized as a JSON (including features derived from these internal DS) but this then must be done on clangd side (not on tools side). IMO this gives more freedom to the tool to use and derive more features which makes experimentation easier. usaxena95: CMIIW. I believe with `trace::Span` we can send only JSON messages from clangd to the tool…
CodeCompleteResult		CodeCompleteResult
toCodeCompleteResult(const std::vector<ScoredBundle> &Scored) {		toCodeCompleteResult(const std::vector<ScoredBundle> &Scored) {
CodeCompleteResult Output;		CodeCompleteResult Output;

// Convert the results to final form, assembling the expensive strings.		// Convert the results to final form, assembling the expensive strings.
for (auto &C : Scored) {		for (auto &C : Scored) {
Output.Completions.push_back(toCodeCompletion(C.first));		Output.Completions.push_back(toCodeCompletion(C.first));
Output.Completions.back().Score = C.second;		Output.Completions.back().Score = C.second;
▲ Show 20 Lines • Show All 183 Lines • ▼ Show 20 Lines	void addCandidate(TopN<ScoredBundle, ScoredBundleGreater> &Candidates,
Scores.Quality = Quality.evaluate();		Scores.Quality = Quality.evaluate();
Scores.Relevance = Relevance.evaluate();		Scores.Relevance = Relevance.evaluate();
Scores.Total = evaluateSymbolAndRelevance(Scores.Quality, Scores.Relevance);		Scores.Total = evaluateSymbolAndRelevance(Scores.Quality, Scores.Relevance);
// NameMatch is in fact a multiplier on total score, so rescoring is sound.		// NameMatch is in fact a multiplier on total score, so rescoring is sound.
Scores.ExcludingName = Relevance.NameMatch		Scores.ExcludingName = Relevance.NameMatch
? Scores.Total / Relevance.NameMatch		? Scores.Total / Relevance.NameMatch
: Scores.Quality;		: Scores.Quality;

		if (Opts.RecordCCResult)
		Opts.RecordCCResult(toCodeCompletion(Bundle), Quality, Relevance,
		Scores.Total);

dlog("CodeComplete: {0} ({1}) = {2}\n{3}{4}\n", First.Name,		dlog("CodeComplete: {0} ({1}) = {2}\n{3}{4}\n", First.Name,
llvm::to_string(Origin), Scores.Total, llvm::to_string(Quality),		llvm::to_string(Origin), Scores.Total, llvm::to_string(Quality),
llvm::to_string(Relevance));		llvm::to_string(Relevance));

NSema += bool(Origin & SymbolOrigin::AST);		NSema += bool(Origin & SymbolOrigin::AST);
NIndex += FromIndex;		NIndex += FromIndex;
NSemaAndIndex += bool(Origin & SymbolOrigin::AST) && FromIndex;		NSemaAndIndex += bool(Origin & SymbolOrigin::AST) && FromIndex;
NIdent += bool(Origin & SymbolOrigin::Identifier);		NIdent += bool(Origin & SymbolOrigin::Identifier);
▲ Show 20 Lines • Show All 204 Lines • Show Last 20 Lines

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp

	Show All 23 Lines
	#include "clang/Sema/CodeCompleteConsumer.h"			#include "clang/Sema/CodeCompleteConsumer.h"
	#include "clang/Tooling/CompilationDatabase.h"			#include "clang/Tooling/CompilationDatabase.h"
	#include "llvm/Support/Error.h"			#include "llvm/Support/Error.h"
	#include "llvm/Support/Path.h"			#include "llvm/Support/Path.h"
	#include "llvm/Testing/Support/Error.h"			#include "llvm/Testing/Support/Error.h"
	#include "gmock/gmock.h"			#include "gmock/gmock.h"
	#include "gtest/gtest.h"			#include "gtest/gtest.h"
	#include <condition_variable>			#include <condition_variable>
				#include <functional>
	#include <mutex>			#include <mutex>
				#include <vector>

	namespace clang {			namespace clang {
	namespace clangd {			namespace clangd {

	namespace {			namespace {
	using ::llvm::Failed;			using ::llvm::Failed;
	using ::testing::AllOf;			using ::testing::AllOf;
	using ::testing::Contains;			using ::testing::Contains;
	▲ Show 20 Lines • Show All 995 Lines • ▼ Show 20 Lines
	template <template <class> class TT> int foo() {			template <template <class> class TT> int foo() {
	int a = ^			int a = ^
	}			}
	)cpp")			)cpp")
	.Completions;			.Completions;
	EXPECT_THAT(Completions, Contains(Named("TT")));			EXPECT_THAT(Completions, Contains(Named("TT")));
	}			}

				TEST(CompletionTest, RecordCCResultCallback) {
				std::vector<CodeCompletion> RecordedCompletions;
				CodeCompleteOptions Opts;
				sammccallUnsubmitted Done Reply Inline Actions Nit: typically `auto` here (the anonymous lambda type) and let it convert to function implicitly when needed. No need for `-> type` in trivial cases sammccall: Nit: typically `auto` here (the anonymous lambda type) and let it convert to function…
				Opts.RecordCCResult = [&RecordedCompletions](const CodeCompletion &CC,
				const SymbolQualitySignals &,
				const SymbolRelevanceSignals &,
				float Score) {
				RecordedCompletions.push_back(CC);
				};

				completions("int xy1, xy2; int a = xy^", /IndexSymbols=/{}, Opts);
				EXPECT_THAT(RecordedCompletions,
				UnorderedElementsAre(Named("xy1"), Named("xy2")));
				}

	SignatureHelp signatures(llvm::StringRef Text, Position Point,			SignatureHelp signatures(llvm::StringRef Text, Position Point,
	std::vector<Symbol> IndexSymbols = {}) {			std::vector<Symbol> IndexSymbols = {}) {
	std::unique_ptr<SymbolIndex> Index;			std::unique_ptr<SymbolIndex> Index;
	if (!IndexSymbols.empty())			if (!IndexSymbols.empty())
	Index = memIndex(IndexSymbols);			Index = memIndex(IndexSymbols);

	MockFSProvider FS;			MockFSProvider FS;
	MockCompilationDatabase CDB;			MockCompilationDatabase CDB;
	▲ Show 20 Lines • Show All 1,762 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add instrumentation mode in clangd for metrics collection.
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248429

clang-tools-extra/clangd/CodeComplete.h

clang-tools-extra/clangd/CodeComplete.cpp

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp

This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add instrumentation mode in clangd for metrics collection.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 248429

clang-tools-extra/clangd/CodeComplete.h

clang-tools-extra/clangd/CodeComplete.cpp

clang-tools-extra/clangd/unittests/CodeCompleteTests.cpp

[clangd] Add instrumentation mode in clangd for metrics collection.
ClosedPublic