This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Use accessible scopes to query indexes for global code completion.
ClosedPublic

Authored by hokein on Jan 15 2018, 8:19 AM.

Details

Summary
  • For qualified completion (foo::a^)
    • unresolved qualifier - use global namespace ("::")
    • resolved qualifier - use all accessible namespaces inside the resolved qualifier.
  • For unqualified completion (vec^), use scopes that are accessible from the scope from which code completion occurs.

Event Timeline

hokein created this revision.Jan 15 2018, 8:19 AM

Discussed this offline.
If we move most to collect visited DeclContexts to CodeCompletionDeclConsumer (from CodeComplete.cpp) and provide a list of DeclContexts to CodeCompletionConsumer, we will be able to get rid of running lookup manually in clangd.

Sounds like implementation is in flux, but a few comments that might be relevant.

clangd/CodeComplete.cpp
314–350

Probably not for this patch, but I just realized...

Currently if the completion is not qualified, we don't track what namespaces are in scope.
This is really relevant for ranking though.
e.g. "using namespace std; namespace foo { v^ }"
The index should apply namespace-based bonuses: foo::value > std::vector > boost::variant.
At the moment we don't provide enough information to do this. But this optional<QueryScopes> probably wants to be a QueryScopes + bool Qualified, or two QueryScopes (one for restrictions, and one for scoring)

So I'd suggest either here or in queryscopes:

// TODO: for unqualified completions, capture scopes and use for scoring.
316

this could also be just a using, rather than a struct - do you see more members here?

322

Just to check, if the user types:
"vec" --> None
"::vec" --> {""}
"std::vec" --> {"std"}
"using namespace std; vec" --> None
"using namespace std; ::vec" --> {"", "std"}

is this right?

707

you need to set loadexternal to false to avoid deserializing the whole preamble.
Ilya says this is going to get merged with lookup during code-completion, which would be nice :-)

ilya-biryukov added inline comments.Jan 18 2018, 8:07 AM
clangd/CodeComplete.cpp
322

I think the idea was to have (I only highlight the differences):
"vec" --> {""}
"using namespace std; vec" --> {"", "std"}

@hokein , or am I getting it wrong?

sammccall added inline comments.Jan 18 2018, 10:27 AM
clangd/CodeComplete.cpp
322

You're probably right, just want to be sure we're talking about the same thing.

There's two layers here: the context detected from sema, and what we're going to send the index. The layering is more relevant now that more of this moves out of clangd.

for "vec", we should be sending {""} to the index for now, and later move towards doing global completion.
But if possible the detection code should report None, and the query-index code should translate it - there's no reason the detection code needs to be wrong just because clangd can't do qualifier insertion/smart ranking/etc.

That said, per our discussion this morning, the detected state shouldn't really be Optional<vector<string>>, but rather struct { vector<string> AccessibleScope, optional<string> UnresolvedQualifier } or something like that...

ilya-biryukov added inline comments.Jan 19 2018, 2:54 AM
clangd/CodeComplete.cpp
322
  • Totally agree, this patch only fills the vector<string> AccessibleScope field, and optional<string> UnresolvedQualifier could be added in the follow-up patch when it's actually handled.
  • It still makes sense to have optional<QueryScopes> to distinguish cases when clang did not run the lookup during completion and the scopes were not set (i.e. completion inside macros, etc.)
hokein updated this revision to Diff 130618.Jan 19 2018, 7:25 AM
hokein marked 4 inline comments as done.
  • Rebase the patch to latest
  • Update the patch based on the offline discussion

Updating D42073: [clangd] Query all visible scopes based on all visible using-namespace declarations

and containing namespace for global qualified code completion.

hokein retitled this revision from [clangd] Query all visible scopes based on all visible using-namespace declarations and containing namespace for global qualified code completion. to [clangd] Use accessible scopes to query indexes for global code completion..Jan 19 2018, 7:26 AM
hokein edited the summary of this revision. (Show Details)
hokein removed a reviewer: jkorous-apple.
hokein added inline comments.Jan 19 2018, 7:32 AM
clangd/CodeComplete.cpp
322

@ilya-biryukov your are right.

Based on our discussion last morning, the accessible scopes:

//   qualified completion
//   "::vec" => {"::"}
//   "using namespace std; ::vec^" => {"::", "std"}
//   "namespace ns {using namespace std;} ns::^" => {"ns", "std"}
//   "std::vec^" => {"::"}  // unresolved "std::"
//
//   unqualified completion
//   "vec^" => {"::"}
//   "using namespace std; vec^" => {"::", "std"}
//   "using namespace std; namespace ns { vec^ }" => {"ns", "std", "::"}

With rL322945, now it is sensible to include all (qualified/unqualified completion) in this patch.

707

not needed now as we don't do lookup manually in clangd.

I think this mostly does the right thing, but I find some of the code structure/description confusing.
I might have made misguided comments, happy to sit down together and work this out :-)

clangd/CodeComplete.cpp
318

This comment is a bit confusing.
If this struct is trying to model "what we're going to query the index for" then it's not needed - just use vector<string>.
If it's trying to model the context of the identifier we're completing, then this comment is inaccurate and should be something like The namespace context of the partial identifier we're trying to complete.
Then add something like We use this when querying the index for more results.

318

nit: avoid \brief where possible - the first sentence is used by default.

321

it's not totally clear to me whether you're talking about *our* sema-based results, or the sema infrastructure we're using, or it's a mistake and you mean our index code completion.

Maybe just:

// FIXME: When we resolve part of a scope chain (e.g. "known::unknown::ident"), we should
// expand the known part rather than treating the whole thing as unknown.

I think this should go in the implementation, rather than the type comment.

329–330

Hmm, you're renamed this from SpecifiedScope to QualifiedScope. I don't understand this name, can you explain what it means?
(SpecifiedScope refers to the scope that the *user* specified when typing)

331

This is a bit hard to parse, and doesn't seem to describe the "unresolved by sema" case in a meaningful way.

What about:

The scopes we should look in, determined by Sema.
If the qualifier was fully resolved, we should look for completions in these scopes.
If there is an unresolved part of the qualifier, it should be resolved within these scopes.

That way we describe what we aim to do (which is pretty simple), and we can document deviations with FIXMEs.

336

for this case: why not "the containing namespace and all accessible namespaces"? You've implemented that below, and it seems like the right thing to do here.

342

you seem to have a comment-in-a-comment here.
nit: please line up the => so it forms a table.

I'd like to see an example `"namespace ns { vec^ }" => {"ns", ""}
(I know it's not what the code currently does, but IMO it should!)

344

This is ambiguous, does it mean The suffix of the user-writter qualifiers that Sema didn't resolve or The full scope qualifier as typed by the user?
(I'm hoping the former, but not sure).

348

this needs a new name. Previously it described one scope, and was being formatted to match the index representation.
Now it contains logic to determine the index query, so maybe scopesForIndexQuery()?

350

what is "VS"?

352

It seems like you have stored the scopes in an unknown format, and call "normalizeScope" defensively? Seems cleaner if you ensure both the scopes and unknown are in the form:

  • global = ""
  • top-level = "::foo"
  • nested = "::foo::bar"

Then this code can produce similarly-formatted output with just:

Results.push_back(VS);
if (UnresolvedQualifier)
  Results.back() += *UnresolvedQualifier;

We need to trim leading :: before sending to the index, but I think that's because we got the index API wrong. I'll send a patch to fix it.

863–864

This seems like too much fiddly logic inlined here.

Can't getQualifiedScopes(CodeCompletionContext, Sema) -> QualifiedScopes handle both cases here?

867

Please use a QualifiedScopes object to handle this case too.
It's simple: the unresolved part is "", and the list of accessible scopes is the same one you're computing here.

869

the global scope is spelled "" in both conventions we use (the current one, and the one we should be using instead :-)).

874

log doesn't work that way - you need to combine into a single message :-)

I think since we don't have verbosity levels, it's best to remove this - it's much more fine-grained than anything else we log, so it will be spammy.
(If we do want to log something, we should summarize the whole query.)

sammccall added inline comments.Jan 19 2018, 2:22 PM
clangd/CodeComplete.cpp
352

(that patch landed as rL323000)

ilya-biryukov added inline comments.Jan 22 2018, 2:55 AM
clangd/CodeComplete.cpp
708

NIT: s/dureing/during

hokein updated this revision to Diff 130883.Jan 22 2018, 6:54 AM
hokein marked 13 inline comments as done.

Address review comments.

Thanks for the comments!

clangd/CodeComplete.cpp
321

I was talking about the sema infrastructure .

329–330

My original thought was that this structure is used only for qualified completion (not unqualified completion).

336

Currently Sema doesn't support it, (that being said the visited contexts for this case is empty).

342

This was focusing on qualified completions only, the unqualified sample is provided separately (below).

Have made it together.

344

unfortunately, it is the second one.

350

means visible scope, renamed it to AS.

352

Thanks. Done.

863–864

Yes, we can, that would simplify the code.

874

I think the log here is pretty useful for debugging.

sammccall accepted this revision.Jan 22 2018, 10:44 AM
sammccall added inline comments.
clangd/CodeComplete.cpp
704–705

could you move this to be right after the closely-related SpecifiedScope struct? I can't remember why they got separated, probably my fault :-)

705

as Ilya pointed out to me, passing Sema around is a big scary thing that we should try to avoid. Here it's only used to get the text if the CXXScopeSpec is invalid. Can we pass just the SourceManager, or better yet, the text as a StringPiece?

712

do you need to remove any leading :: here?

874

SG. It might be worth more clearly tying the log message to the other code complete one:, e.g. "Code complete: fuzzyFind(\"{0}\", Scopes=[{1}])"

unittests/clangd/CodeCompleteTests.cpp
679–782

Can you factor out the common code into a function?

e.g.

vector<FuzzyFindRequest> captureIndexRequests(string example) {
  IndexRequestCollector Index; // actually the class can be local here
  ...
}

Like completions(), this avoids boilerplate in the tests.

This revision is now accepted and ready to land.Jan 22 2018, 10:44 AM
hokein updated this revision to Diff 131013.Jan 23 2018, 2:28 AM
hokein marked 5 inline comments as done.

Address remaining comments.

hokein updated this revision to Diff 131014.Jan 23 2018, 2:30 AM

Cleanup tests.

hokein added inline comments.Jan 23 2018, 2:31 AM
clangd/CodeComplete.cpp
705

SG, we don't need the Sema to compute the DeclContext from scope specifier. SourceManager is the only thing we need from Sema now.

712

Not needed, as Sema exclues the trailing "::". Added a comment.

hokein updated this revision to Diff 131018.Jan 23 2018, 2:57 AM

remove the leading "::".

clangd/CodeComplete.cpp
712

Aha, sorry -- I misread the comment. Yeah, we need to remove the leading "::", added a test to catch this bug.

hokein updated this revision to Diff 131019.Jan 23 2018, 2:59 AM

Format the test code.

This revision was automatically updated to reflect the committed changes.