This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clangd/
3
CodeComplete.cpp
-
unittests/clangd/
-
clangd/
-
CodeCompleteTests.cpp

Differential D47183

[clangd] Support multiple sema code completion callbacks.
AbandonedPublic

Authored by ioeric on May 22 2018, 1:24 AM.

Download Raw Diff

Details

Reviewers

ilya-biryukov
sammccall

Summary

Currently, we only handle the first callback from sema code completion
and ignore results from potential following callbacks. This causes
causes loss of completion results when multiple contexts are tried by Sema.

For example, we wouldn't get any completion result in the following completion
as the first attemped context is natural language which has no
candidate. The parser would backtrack and tried a completion with AST
semantic, which would find candidate "::x".

void f(const char*, int);
#define F(x) f(#x, x)
int x;
void main() {
	F(::^);
}

Diff Detail

Repository

rCTE Clang Tools Extra

Build Status

Buildable 18439
Build 18439: arc lint + arc unit

Event Timeline

ioeric created this revision.May 22 2018, 1:24 AM

Herald added subscribers: cfe-commits, jkorous, MaskRay, klimek. · View Herald TranscriptMay 22 2018, 1:24 AM

ioeric edited the summary of this revision. (Show Details)May 22 2018, 1:25 AM

ilya-biryukov added inline comments.May 22 2018, 1:38 AM

clangd/CodeComplete.cpp
457	Could we fix this in the first attempt? All the code around completion is really not prepared for multiple callbacks: index will be queried multiple times, we will store and log only the last CCContext, not all of them, etc. I'd argue we should aim to either provide a single-callback completion or rewrite the whole code around completion to properly handle multiple callbacks (i.e. with deduplication and proper merging of the results coming from multiple callbacks, proper logging, no multiple identical requests to the index). I would suggest the following measures as a hacky intermediate solution: Ignore natural language completion. The rationale: VSCode does analogous completion on empty results anyway, AFAIK clang does not provide any useful results on top of that. Other clients that we have can (and should?) probably do the same. Only record the first non-natural language completion attempt. Ignore the rest and log the failed attempts.

ioeric added inline comments.May 22 2018, 1:58 AM

clangd/CodeComplete.cpp
457	index will be queried multiple times Patterns of multi-context callbacks I've seen are: natural language + Name: this happens mostly when parsing macros with stringification. name + name: this happens in the pre-existing unit test case. I don't really understand why and how often this comes up, but I think the duplication should be eliminated. language+recovery: haven't looked into what the recovery context does. we will store and log only the last CCContext What's the concern about storing only the last context? I'd argue we should aim to either provide a single-callback completion or rewrite the whole code around completion to properly handle multiple callbacks ( I'm not sure how we could (fully) get away with one callback without significantly changing sema parsing. This seems to be an expensive approach though. with deduplication and proper merging of the results coming from multiple callbacks I agree. My impression is that multiple callbacks are not common and thus not as important (duplicates are better than no results IMO). But I might be wrong thinking this is uncommon. I was going to do this in a followup patch to avoid a big patch, but I'm happy to do the deduplication in the same patch if you prefer. proper logging Could you point out what logging is missing? no multiple identical requests to the index For context combinations I've seen (natural language + name, natural language + recovery), index is still queries once. If sema does decide to call name multiple times with context that would potentially yield two index queries, we could still need to query indexes twice (don't see a big problems doing this if not a common case). For identical context that is called multiple times, we could cache potentially results. I would suggest the following measures as a hacky intermediate solution: I think natural language is only one of the contexts that could result in multiple callbacks, so I don't think this would fully resolve our problems.

ilya-biryukov added inline comments.May 22 2018, 2:29 AM

clangd/CodeComplete.cpp
457	What's the concern about storing only the last context? If we store just one context, the code might look as if there was only one callback. I was trying to make a point that 1) storing a list of contexts or 2) not storing the context at all, are the two options that seem to be more suitable for multi-callback case. I'm not sure how we could (fully) get away with one callback without significantly changing sema parsing. This seems to be an expensive approach though. Yeah, there's certainly no way we can change sema parsing. What we could do, though, is to not call code completion during tentative parsing. This shouldn't be too hard and that's certainly the only case that can give new interesting results in practice, i.e. doing natural language/recovery twice will certainly not change the results. Could you point out what logging is missing? Signalling that there were multiple completion callbacks, showing context for each of those, etc. We seem to log individual callbacks currently, but a small summary of how many callbacks were called would be nice too. no multiple identical requests to the index For context combinations I've seen (natural language + name, natural language + recovery), index is still queries once. If sema does decide to call name multiple times with context that would potentially yield two index queries, we could still need to query indexes twice (don't see a big problems doing this if not a common case). For identical context that is called multiple times, we could cache potentially results. IIUC, multiple callbacks can also happen because of the tentative parsing. It means we could easily get lots of callbacks on ambiguous C++ grammar constructs. We just need to make sure we don't do identical calls to the index in those cases. Caching index requests in the ongoing completion should definitely do it. I would suggest the following measures as a hacky intermediate solution: I think natural language is only one of the contexts that could result in multiple callbacks, so I don't think this would fully resolve our problems. From my observations, all sema (i.e. non-recovery/natural lang) contexts provide mostly similar results or don't trigger together, i.e. we won't ever get non-member completion after member completions. In theory, current completion API can provide results that we're gonna miss if we ignore other contexts. In practice, I bet we would be fine. That being said, merging of completion results is also an option that seems good. Albeit, I think that's something clang should handle in its code completion internally and clients shouldn't care about.

OK, it turned out that there are a few things that I overlooked:

Top-N Ranking would be broken among different callbacks. We would need to keep a single TopN for all callbacks.
Due to 1), materialization of TopN candidates would happen after all callbacks have finished but before sema is destroyed. The problem is that the lifetime of completion results from one sema callback do not always outlive the callback, which would break our lazy materialization.

I think multi-callback support should require more careful design. And to fix the code completion in macros, I'll do the context filtering (as Ilya suggested) for now.

Dropping this in favor of D47256

Revision Contents

Path

Size

clangd/

CodeComplete.cpp

28 lines

unittests/

clangd/

CodeCompleteTests.cpp

25 lines

Diff 147957

clangd/CodeComplete.cpp

Show First 20 Lines • Show All 425 Lines • ▼ Show 20 Lines	struct CompletionRecorder : public CodeCompleteConsumer {
std::vector<CodeCompletionResult> Results;		std::vector<CodeCompletionResult> Results;
CodeCompletionContext CCContext;		CodeCompletionContext CCContext;
Sema *CCSema = nullptr; // Sema that created the results.		Sema *CCSema = nullptr; // Sema that created the results.
// FIXME: Sema is scary. Can we store ASTContext and Preprocessor, instead?		// FIXME: Sema is scary. Can we store ASTContext and Preprocessor, instead?

void ProcessCodeCompleteResults(class Sema &S, CodeCompletionContext Context,		void ProcessCodeCompleteResults(class Sema &S, CodeCompletionContext Context,
CodeCompletionResult *InResults,		CodeCompletionResult *InResults,
unsigned NumResults) override final {		unsigned NumResults) override final {
if (CCSema) {		log("Processing completion results in context " +
log(llvm::formatv(		getCompletionKindString(Context.getKind()));
"Multiple code complete callbacks (parser backtracked?). "
"Dropping results from context {0}, keeping results from {1}.",
getCompletionKindString(this->CCContext.getKind()),
getCompletionKindString(Context.getKind())));
return;
}
// Record the completion context.		// Record the completion context.
CCSema = &S;		CCSema = &S;
CCContext = Context;		CCContext = Context;

// Retain the results we might want.		// Retain the results we might want.
for (unsigned I = 0; I < NumResults; ++I) {		for (unsigned I = 0; I < NumResults; ++I) {
auto &Result = InResults[I];		auto &Result = InResults[I];
// Drop hidden items which cannot be found by lookup after completion.		// Drop hidden items which cannot be found by lookup after completion.
// Exception: some items can be named by using a qualifier.		// Exception: some items can be named by using a qualifier.
if (Result.Hidden && (!Result.Qualifier \|\| Result.QualifierIsInformative))		if (Result.Hidden && (!Result.Qualifier \|\| Result.QualifierIsInformative))
continue;		continue;
if (!Opts.IncludeIneligibleResults &&		if (!Opts.IncludeIneligibleResults &&
(Result.Availability == CXAvailability_NotAvailable \|\|		(Result.Availability == CXAvailability_NotAvailable \|\|
Result.Availability == CXAvailability_NotAccessible))		Result.Availability == CXAvailability_NotAccessible))
continue;		continue;
// Destructor completion is rarely useful, and works inconsistently.		// Destructor completion is rarely useful, and works inconsistently.
// (s.^ completes ~string, but s.~st^ is an error).		// (s.^ completes ~string, but s.~st^ is an error).
if (dyn_cast_or_null<CXXDestructorDecl>(Result.Declaration))		if (dyn_cast_or_null<CXXDestructorDecl>(Result.Declaration))
continue;		continue;
// We choose to never append '::' to completion results in clangd.		// We choose to never append '::' to completion results in clangd.
Result.StartsNestedNameSpecifier = false;		Result.StartsNestedNameSpecifier = false;
		// FIXME: the same result can be added multiple times as the callback can
		ilya-biryukovUnsubmitted Not Done Reply Inline Actions Could we fix this in the first attempt? All the code around completion is really not prepared for multiple callbacks: index will be queried multiple times, we will store and log only the last CCContext, not all of them, etc. I'd argue we should aim to either provide a single-callback completion or rewrite the whole code around completion to properly handle multiple callbacks (i.e. with deduplication and proper merging of the results coming from multiple callbacks, proper logging, no multiple identical requests to the index). I would suggest the following measures as a hacky intermediate solution: Ignore natural language completion. The rationale: VSCode does analogous completion on empty results anyway, AFAIK clang does not provide any useful results on top of that. Other clients that we have can (and should?) probably do the same. Only record the first non-natural language completion attempt. Ignore the rest and log the failed attempts. ilya-biryukov: Could we fix this in the first attempt? All the code around completion is really not prepared…
		ioericAuthorUnsubmitted Not Done Reply Inline Actions index will be queried multiple times Patterns of multi-context callbacks I've seen are: natural language + Name: this happens mostly when parsing macros with stringification. name + name: this happens in the pre-existing unit test case. I don't really understand why and how often this comes up, but I think the duplication should be eliminated. language+recovery: haven't looked into what the recovery context does. we will store and log only the last CCContext What's the concern about storing only the last context? I'd argue we should aim to either provide a single-callback completion or rewrite the whole code around completion to properly handle multiple callbacks ( I'm not sure how we could (fully) get away with one callback without significantly changing sema parsing. This seems to be an expensive approach though. with deduplication and proper merging of the results coming from multiple callbacks I agree. My impression is that multiple callbacks are not common and thus not as important (duplicates are better than no results IMO). But I might be wrong thinking this is uncommon. I was going to do this in a followup patch to avoid a big patch, but I'm happy to do the deduplication in the same patch if you prefer. proper logging Could you point out what logging is missing? no multiple identical requests to the index For context combinations I've seen (natural language + name, natural language + recovery), index is still queries once. If sema does decide to call name multiple times with context that would potentially yield two index queries, we could still need to query indexes twice (don't see a big problems doing this if not a common case). For identical context that is called multiple times, we could cache potentially results. I would suggest the following measures as a hacky intermediate solution: I think natural language is only one of the contexts that could result in multiple callbacks, so I don't think this would fully resolve our problems. ioeric: > index will be queried multiple times Patterns of multi-context callbacks I've seen are: 1)…
		ilya-biryukovUnsubmitted Not Done Reply Inline Actions What's the concern about storing only the last context? If we store just one context, the code might look as if there was only one callback. I was trying to make a point that 1) storing a list of contexts or 2) not storing the context at all, are the two options that seem to be more suitable for multi-callback case. I'm not sure how we could (fully) get away with one callback without significantly changing sema parsing. This seems to be an expensive approach though. Yeah, there's certainly no way we can change sema parsing. What we could do, though, is to not call code completion during tentative parsing. This shouldn't be too hard and that's certainly the only case that can give new interesting results in practice, i.e. doing natural language/recovery twice will certainly not change the results. Could you point out what logging is missing? Signalling that there were multiple completion callbacks, showing context for each of those, etc. We seem to log individual callbacks currently, but a small summary of how many callbacks were called would be nice too. no multiple identical requests to the index For context combinations I've seen (natural language + name, natural language + recovery), index is still queries once. If sema does decide to call name multiple times with context that would potentially yield two index queries, we could still need to query indexes twice (don't see a big problems doing this if not a common case). For identical context that is called multiple times, we could cache potentially results. IIUC, multiple callbacks can also happen because of the tentative parsing. It means we could easily get lots of callbacks on ambiguous C++ grammar constructs. We just need to make sure we don't do identical calls to the index in those cases. Caching index requests in the ongoing completion should definitely do it. I would suggest the following measures as a hacky intermediate solution: I think natural language is only one of the contexts that could result in multiple callbacks, so I don't think this would fully resolve our problems. From my observations, all sema (i.e. non-recovery/natural lang) contexts provide mostly similar results or don't trigger together, i.e. we won't ever get non-member completion after member completions. In theory, current completion API can provide results that we're gonna miss if we ignore other contexts. In practice, I bet we would be fine. That being said, merging of completion results is also an option that seems good. Albeit, I think that's something clang should handle in its code completion internally and clients shouldn't care about. ilya-biryukov: > What's the concern about storing only the last context? If we store just one context, the…
		// be called more than once. We may want to deduplicate identical results.
Results.push_back(Result);		Results.push_back(Result);
}		}
ResultsCallback();		ResultsCallback();
}		}

CodeCompletionAllocator &getAllocator() override { return *CCAllocator; }		CodeCompletionAllocator &getAllocator() override { return *CCAllocator; }
CodeCompletionTUInfo &getCodeCompletionTUInfo() override { return CCTUInfo; }		CodeCompletionTUInfo &getCodeCompletionTUInfo() override { return CCTUInfo; }

▲ Show 20 Lines • Show All 249 Lines • ▼ Show 20 Lines	Clang->getPreprocessor().addPPCallbacks(collectInclusionsInMainFileCallback(
Includes->get()->addExisting(std::move(Inc));		Includes->get()->addExisting(std::move(Inc));
}));		}));
}		}
if (!Action.Execute()) {		if (!Action.Execute()) {
log("Execute() failed when running codeComplete for " + Input.FileName);		log("Execute() failed when running codeComplete for " + Input.FileName);
return false;		return false;
}		}
Action.EndSourceFile();		Action.EndSourceFile();
		if (Includes)
		Includes->reset(); // Make sure this doesn't out-live Clang.

return true;		return true;
}		}

// Should we perform index-based completion in a context of the specified kind?		// Should we perform index-based completion in a context of the specified kind?
// FIXME: consider allowing completion, but restricting the result types.		// FIXME: consider allowing completion, but restricting the result types.
bool contextAllowsIndex(enum CodeCompletionContext::Kind K) {		bool contextAllowsIndex(enum CodeCompletionContext::Kind K) {
switch (K) {		switch (K) {
▲ Show 20 Lines • Show All 137 Lines • ▼ Show 20 Lines	CompletionList run(const SemaCompleteInput &SemaCCInput) && {
// - completion results based on the AST.		// - completion results based on the AST.
// - partial identifier and context. We need these for the index query.		// - partial identifier and context. We need these for the index query.
CompletionList Output;		CompletionList Output;
auto RecorderOwner = llvm::make_unique<CompletionRecorder>(Opts, [&]() {		auto RecorderOwner = llvm::make_unique<CompletionRecorder>(Opts, [&]() {
assert(Recorder && "Recorder is not set");		assert(Recorder && "Recorder is not set");
assert(Includes && "Includes is not set");		assert(Includes && "Includes is not set");
// If preprocessor was run, inclusions from preprocessor callback should		// If preprocessor was run, inclusions from preprocessor callback should
// already be added to Inclusions.		// already be added to Inclusions.
Output = runWithSema();		auto Items = runWithSema();
Includes.reset(); // Make sure this doesn't out-live Clang.		Output.items.insert(Output.items.end(),
		std::make_move_iterator(Items.begin()),
		std::make_move_iterator(Items.end()));
SPAN_ATTACH(Tracer, "sema_completion_kind",		SPAN_ATTACH(Tracer, "sema_completion_kind",
getCompletionKindString(Recorder->CCContext.getKind()));		getCompletionKindString(Recorder->CCContext.getKind()));
});		});

Recorder = RecorderOwner.get();		Recorder = RecorderOwner.get();
semaCodeComplete(std::move(RecorderOwner), Opts.getClangCompleteOpts(),		semaCodeComplete(std::move(RecorderOwner), Opts.getClangCompleteOpts(),
SemaCCInput, &Includes);		SemaCCInput, &Includes);

SPAN_ATTACH(Tracer, "sema_results", NSema);		SPAN_ATTACH(Tracer, "sema_results", NSema);
SPAN_ATTACH(Tracer, "index_results", NIndex);		SPAN_ATTACH(Tracer, "index_results", NIndex);
SPAN_ATTACH(Tracer, "merged_results", NBoth);		SPAN_ATTACH(Tracer, "merged_results", NBoth);
SPAN_ATTACH(Tracer, "returned_results", Output.items.size());		SPAN_ATTACH(Tracer, "returned_results", Output.items.size());
SPAN_ATTACH(Tracer, "incomplete", Output.isIncomplete);		SPAN_ATTACH(Tracer, "incomplete", Output.isIncomplete);
log(llvm::formatv("Code complete: {0} results from Sema, {1} from Index, "		log(llvm::formatv("Code complete: {0} results from Sema, {1} from Index, "
"{2} matched, {3} returned{4}.",		"{2} matched, {3} returned{4}.",
NSema, NIndex, NBoth, Output.items.size(),		NSema, NIndex, NBoth, Output.items.size(),
Output.isIncomplete ? " (incomplete)" : ""));		Output.isIncomplete ? " (incomplete)" : ""));
assert(!Opts.Limit \|\| Output.items.size() <= Opts.Limit);		assert(!Opts.Limit \|\| Output.items.size() <= Opts.Limit);
		Output.isIncomplete = Incomplete;
// We don't assert that isIncomplete means we hit a limit.		// We don't assert that isIncomplete means we hit a limit.
// Indexes may choose to impose their own limits even if we don't have one.		// Indexes may choose to impose their own limits even if we don't have one.
return Output;		return Output;
}		}

private:		private:
// This is called by run() once Sema code completion is done, but before the		// This is called by run() once Sema code completion is done, but before the
// Sema data structures are torn down. It does all the real work.		// Sema data structures are torn down. It does all the real work.
CompletionList runWithSema() {		std::vector<CompletionItem> runWithSema() {
Filter = FuzzyMatcher(		Filter = FuzzyMatcher(
Recorder->CCSema->getPreprocessor().getCodeCompletionFilter());		Recorder->CCSema->getPreprocessor().getCodeCompletionFilter());
// Sema provides the needed context to query the index.		// Sema provides the needed context to query the index.
// FIXME: in addition to querying for extra/overlapping symbols, we should		// FIXME: in addition to querying for extra/overlapping symbols, we should
// explicitly request symbols corresponding to Sema results.		// explicitly request symbols corresponding to Sema results.
// We can use their signals even if the index can't suggest them.		// We can use their signals even if the index can't suggest them.
// We must copy index results to preserve them, but there are at most Limit.		// We must copy index results to preserve them, but there are at most Limit.
auto IndexResults = queryIndex();		auto IndexResults = queryIndex();
// Merge Sema and Index results, score them, and pick the winners.		// Merge Sema and Index results, score them, and pick the winners.
auto Top = mergeResults(Recorder->Results, IndexResults);		auto Top = mergeResults(Recorder->Results, IndexResults);
// Convert the results to the desired LSP structs.		// Convert the results to the desired LSP structs.
CompletionList Output;		std::vector<CompletionItem> Output;
for (auto &C : Top)		for (auto &C : Top)
Output.items.push_back(toCompletionItem(C.first, C.second));		Output.push_back(toCompletionItem(C.first, C.second));
Output.isIncomplete = Incomplete;
return Output;		return Output;
}		}

SymbolSlab queryIndex() {		SymbolSlab queryIndex() {
if (!Opts.Index \|\| !allowIndex(Recorder->CCContext))		if (!Opts.Index \|\| !allowIndex(Recorder->CCContext))
return SymbolSlab();		return SymbolSlab();
trace::Span Tracer("Query index");		trace::Span Tracer("Query index");
SPAN_ATTACH(Tracer, "limit", Opts.Limit);		SPAN_ATTACH(Tracer, "limit", Opts.Limit);
▲ Show 20 Lines • Show All 147 Lines • Show Last 20 Lines

unittests/clangd/CodeCompleteTests.cpp

Show First 20 Lines • Show All 667 Lines • ▼ Show 20 Lines	auto Results = completions(R"cpp(
}		}
)cpp");		)cpp");

EXPECT_THAT(Results.items, Contains(Labeled("clang")));		EXPECT_THAT(Results.items, Contains(Labeled("clang")));
EXPECT_THAT(Results.items, Not(Contains(Labeled("clang::"))));		EXPECT_THAT(Results.items, Not(Contains(Labeled("clang::"))));
}		}

TEST(CompletionTest, BacktrackCrashes) {		TEST(CompletionTest, BacktrackCrashes) {
// Sema calls code completion callbacks twice in these cases.
auto Results = completions(R"cpp(		auto Results = completions(R"cpp(
namespace ns {		namespace ns {
struct FooBarBaz {};		struct FooBarBaz {};
} // namespace ns		} // namespace ns

int foo(ns::FooBar^		int foo(ns::FooBar^
)cpp");		)cpp");

EXPECT_THAT(Results.items, ElementsAre(Labeled("FooBarBaz")));		// Sema calls code completion callbacks twice in these cases.
		// FIXME: deduplicate identical results.
		EXPECT_THAT(Results.items, Contains(Labeled("FooBarBaz")));

// Check we don't crash in that case too.		// Check we don't crash in that case too.
completions(R"cpp(		completions(R"cpp(
struct FooBarBaz {};		struct FooBarBaz {};
void test() {		void test() {
if (FooBarBaz * x^) {}		if (FooBarBaz * x^) {}
}		}
)cpp");		)cpp");
}		}

		TEST(CompletionTest, CompleteInMacroWithStringification) {
		auto Results = completions(R"cpp(
		void f(const char *, int x);
		#define F(x) f(#x, x)

		namespace ns {
		int X;
		int Y;
		} // namespace ns

		int f(int input_num) {
		F(ns::^)
		}
		}
		)cpp");

		EXPECT_THAT(Results.items,
		UnorderedElementsAre(Named("X"), Named("Y")));
		}

TEST(CompletionTest, CompleteInExcludedPPBranch) {		TEST(CompletionTest, CompleteInExcludedPPBranch) {
auto Results = completions(R"cpp(		auto Results = completions(R"cpp(
int bar(int param_in_bar) {		int bar(int param_in_bar) {
}		}

int foo(int param_in_foo) {		int foo(int param_in_foo) {
#if 0		#if 0
par^		par^
▲ Show 20 Lines • Show All 264 Lines • Show Last 20 Lines