This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Support multiple sema code completion callbacks.
AbandonedPublic

Authored by ioeric on May 22 2018, 1:24 AM.

Details

Summary

Currently, we only handle the first callback from sema code completion
and ignore results from potential following callbacks. This causes
causes loss of completion results when multiple contexts are tried by Sema.

For example, we wouldn't get any completion result in the following completion
as the first attemped context is natural language which has no
candidate. The parser would backtrack and tried a completion with AST
semantic, which would find candidate "::x".

void f(const char*, int);
#define F(x) f(#x, x)
int x;
void main() {
	F(::^);
}

Event Timeline

ioeric created this revision.May 22 2018, 1:24 AM
ioeric edited the summary of this revision. (Show Details)May 22 2018, 1:25 AM
ilya-biryukov added inline comments.May 22 2018, 1:38 AM
clangd/CodeComplete.cpp
457

Could we fix this in the first attempt?
All the code around completion is really not prepared for multiple callbacks: index will be queried multiple times, we will store and log only the last CCContext, not all of them, etc.
I'd argue we should aim to either provide a single-callback completion or rewrite the whole code around completion to properly handle multiple callbacks (i.e. with deduplication and proper merging of the results coming from multiple callbacks, proper logging, no multiple identical requests to the index).

I would suggest the following measures as a hacky intermediate solution:

  • Ignore natural language completion. The rationale: VSCode does analogous completion on empty results anyway, AFAIK clang does not provide any useful results on top of that. Other clients that we have can (and should?) probably do the same.
  • Only record the first non-natural language completion attempt. Ignore the rest and log the failed attempts.
ioeric added inline comments.May 22 2018, 1:58 AM
clangd/CodeComplete.cpp
457

index will be queried multiple times

Patterns of multi-context callbacks I've seen are:

  1. natural language + Name: this happens mostly when parsing macros with stringification.
  2. name + name: this happens in the pre-existing unit test case. I don't really understand why and how often this comes up, but I think the duplication should be eliminated.
  3. language+recovery: haven't looked into what the recovery context does.

we will store and log only the last CCContext

What's the concern about storing only the last context?

I'd argue we should aim to either provide a single-callback completion or rewrite the whole code around completion to properly handle multiple callbacks (

I'm not sure how we could (fully) get away with one callback without significantly changing sema parsing. This seems to be an expensive approach though.

with deduplication and proper merging of the results coming from multiple callbacks

I agree. My impression is that multiple callbacks are not common and thus not as important (duplicates are better than no results IMO). But I might be wrong thinking this is uncommon. I was going to do this in a followup patch to avoid a big patch, but I'm happy to do the deduplication in the same patch if you prefer.

proper logging

Could you point out what logging is missing?

no multiple identical requests to the index

For context combinations I've seen (natural language + name, natural language + recovery), index is still queries once. If sema does decide to call name multiple times with context that would potentially yield two index queries, we could still need to query indexes twice (don't see a big problems doing this if not a common case). For identical context that is called multiple times, we could cache potentially results.

I would suggest the following measures as a hacky intermediate solution:

I think natural language is only one of the contexts that could result in multiple callbacks, so I don't think this would fully resolve our problems.

ilya-biryukov added inline comments.May 22 2018, 2:29 AM
clangd/CodeComplete.cpp
457

What's the concern about storing only the last context?

If we store just one context, the code might look as if there was only one callback. I was trying to make a point that 1) storing a list of contexts or 2) not storing the context at all, are the two options that seem to be more suitable for multi-callback case.

I'm not sure how we could (fully) get away with one callback without significantly changing sema parsing. This seems to be an expensive approach though.

Yeah, there's certainly no way we can change sema parsing.
What we could do, though, is to not call code completion during tentative parsing. This shouldn't be too hard and that's certainly the only case that can give new interesting results in practice, i.e. doing natural language/recovery twice will certainly not change the results.

Could you point out what logging is missing?

Signalling that there were multiple completion callbacks, showing context for each of those, etc. We seem to log individual callbacks currently, but a small summary of how many callbacks were called would be nice too.

no multiple identical requests to the index

For context combinations I've seen (natural language + name, natural language + recovery), index is still queries once. If sema does decide to call name multiple times with context that would potentially yield two index queries, we could still need to query indexes twice (don't see a big problems doing this if not a common case). For identical context that is called multiple times, we could cache potentially results.

IIUC, multiple callbacks can also happen because of the tentative parsing. It means we could easily get lots of callbacks on ambiguous C++ grammar constructs.
We just need to make sure we don't do identical calls to the index in those cases. Caching index requests in the ongoing completion should definitely do it.

I would suggest the following measures as a hacky intermediate solution:

I think natural language is only one of the contexts that could result in multiple callbacks, so I don't think this would fully resolve our problems.

From my observations, all sema (i.e. non-recovery/natural lang) contexts provide mostly similar results or don't trigger together, i.e. we won't ever get non-member completion after member completions.
In theory, current completion API can provide results that we're gonna miss if we ignore other contexts.
In practice, I bet we would be fine.

That being said, merging of completion results is also an option that seems good. Albeit, I think that's something clang should handle in its code completion internally and clients shouldn't care about.

OK, it turned out that there are a few things that I overlooked:

  1. Top-N Ranking would be broken among different callbacks. We would need to keep a single TopN for all callbacks.
  2. Due to 1), materialization of TopN candidates would happen after all callbacks have finished but before sema is destroyed. The problem is that the lifetime of completion results from one sema callback do not always outlive the callback, which would break our lazy materialization.

I think multi-callback support should require more careful design. And to fix the code completion in macros, I'll do the context filtering (as Ilya suggested) for now.

ioeric abandoned this revision.May 23 2018, 6:27 AM

Dropping this in favor of D47256