This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Support scope proximity in code completion.
ClosedPublic

Authored by ioeric on Oct 11 2018, 3:12 AM.

Details

Summary

This should make all-scope completion more usable. Scope proximity for
indexes will be added in followup patch.

Diff Detail

Repository
rL LLVM

Event Timeline

ioeric created this revision.Oct 11 2018, 3:12 AM
sammccall added inline comments.Oct 11 2018, 5:58 AM
clangd/CodeComplete.cpp
558 ↗(On Diff #169180)

does this do the right thing if it's anonymous or inline, or a parent is?

For indexing, we call a function in AST.h, we should probably do something similar.

The catch is that function relies on NamedDecl::printQualifiedName, maybe we need to factor out the relevant part so we can call it on a DeclContext.

559 ↗(On Diff #169180)

shouldn't this be ""? That's a definite scope, not a failure to find one.
(It's not a *namespace* scope, but I'm not sure why that's important)

clangd/Quality.cpp
246 ↗(On Diff #169180)

If you're going to decide these numbers directly...

a) I think you should just return a multiplier here, rather than a 0-1 score
b) we should more aggressively downrank non-preferred symbols: currently by only ~1/3 vs symbols from a preferred scopes

e.g. I'd suggest returning 1, 2, 1.5, 1, 1, 1.5, 0.3 or similar

301 ↗(On Diff #169180)

I don't think "always in the accessible scope" is sufficient justification to give a score of 1.
However "we don't load top-level symbols from the preamble" might be?

clangd/Quality.h
92 ↗(On Diff #169180)

hmm, can/should we use FileProximity for this, and just transform the strings?

This isn't going to cache for symbols sharing a namespace, isn't going to handle "symbol is in a child namespace of an included namespace", and some other combinations.

ioeric updated this revision to Diff 169231.Oct 11 2018, 9:57 AM
ioeric marked 2 inline comments as done.
  • address review comments
ioeric added inline comments.Oct 11 2018, 9:58 AM
clangd/CodeComplete.cpp
558 ↗(On Diff #169180)

Good catch.

NamedDecl::printQualifiedName doesn't skip the (anonymous) part if the decl itself is anonymous, even with SuppressUnwrittenScope set. So I think we should explicitly filter those out here and then call printQualifiedName.

559 ↗(On Diff #169180)

Make sense.

I think we might still want to special-case the global scope (not here, maybe in proximity scoring) because:

  • if any scope is specified, it's probably more desireable than the global scope.
  • there are patterns like below in cpp files (instead of using enclosing namespace):
using namespace clang;
using namespace clang::clangd;

Currently, if the enclosing scope is "", then we would treat it the same as the scopes from using-namespace directives.

clangd/Quality.cpp
246 ↗(On Diff #169180)

a) I think you should just return a multiplier here, rather than a 0-1 score.

I tried this. Unfortunately, allowing <1 multiplier would make the default value (for sema proximity) tricky. And [0,1] proximity score seems to be less confusing as it's consistent with the file proximity scale. If [0,1] isn't enough, we could still increase the upper bound?

b) we should more aggressively downrank non-preferred symbols: currently by only ~1/3 vs symbols from a preferred scopes

It's ~1/3 only for the global scope vs non-preferred symbols, which seems reasonable to me. I worry penalizing non-preferred scopes too much can make cross-namespace completion less useable.

clangd/Quality.h
92 ↗(On Diff #169180)

It seems to me that it'd be less trivial to associate scopes with up/down traverses. Currently, the query scopes contain all enclosing namespaces, so the distance seems less important here. In general, I think the characteristics of scope proximity (e.g. all enclosing namespaces are already in query scopes) allow us to get away with something more trivial than file proximity?

This isn't going to cache for symbols sharing a namespace.

This seems to be less a concern if we are not calculating up/down distance.

isn't going to handle "symbol is in a child namespace of an included namespace"

This can be covered by SymbolScope.startswith(QueryScope)? It might not work well for using-namespaces, but it's unclear how important that is.

Still happy to consider FileDistance-approach as I might be missing something.

(BTW I wouldn't expect to see improvements on eval here, as

clangd/CodeComplete.cpp
558 ↗(On Diff #169180)

can you move this function to AST.h? e.g. printScope? It seems like an obvious peer to printQualifiedName.

clangd/Quality.cpp
246 ↗(On Diff #169180)

Unfortunately, allowing <1 multiplier would make the default value (for sema proximity) tricky.

There are three types of symbols:

  • those we get from sema only. Here your patch decides to give max boost (I think?) because we don't trust scope comparison for all these cases.
  • those we get from index only. We choose a boost by matching the stated scope to the query scopes.
  • those we get from sema and index. We should *probably* treat these like Sema-only and give max boost.

Note the category of results which we have neither sema nor index for doesn't actually exist. The framework allows for it and we should respect that as best we can, but it can't be the top priority.

So can't we just change the SemaScopeProximityScore to bool SemaSaysInScope or something and evaluate as SemaSaysInScope ? MaxBoost : QueryScopeProximity ? evaluateUsingQueryScopeProximity() : 1?

It's ~1/3 only for the global scope vs non-preferred symbols

Queried scopes (not most-preferred, and not the global scope!) get a boost of 1.6, while completely irrelevant scopes get a boost of 1. This is a relative penalty of only 37.5%.
(I think a more reasonable penalty would be in the range of 75% or 80%, i.e. a 3-4x boost for symbols that are in-scope)

clangd/Quality.h
92 ↗(On Diff #169180)

Currently, the query scopes contain all enclosing namespaces, so the distance seems less important here.

Sure. But there cases like using namespace std and then _1 which should resolve to std::placeholders::_1.
Or being in namespace clang::clangd and wanting to use a symbol clang::tooling::CompilationDatabase.

allow us to get away with something more trivial than file proximity?

Maybe we can. But we should only be trying to get away with it if it's some combination of {less complex, better, faster}. It doesn't seem clear to me that it's any of these.

This isn't going to cache for symbols sharing a namespace.

This seems to be less a concern if we are not calculating up/down distance.

Again, less of a concern, but not obviously a non-issue. With this implementation, you compare the scope string of each symbol to every query scope. With FileDistance, you hash the scope string and then almost-always hit the cache.

This can be covered by SymbolScope.startswith(QueryScope)?

You've updated the code for this, but now you're not giving nested-within-query-scope any penalty compared to directly-in-query-scope. All of this can be implemented, but I don't really understand why - does reusing the impl not seem simpler?

ioeric updated this revision to Diff 169814.Oct 16 2018, 6:08 AM
ioeric marked 3 inline comments as done.
  • refactor to use FileDistance and address review comments.
sammccall added inline comments.Oct 16 2018, 7:29 AM
clangd/AST.cpp
77 ↗(On Diff #169814)

It would be nice to have the invariant splitQualifiedName(printQualifiedName(D)) == printScope(D.getContext()) for Decl D.

That's why I was suggesting splitting NamedDecl::printQualifiedName.

Given the limited functionality here it should probably be called printNamespaceScope, lest someone try to reuse it to get a symbol's scope.

clangd/AST.h
51 ↗(On Diff #169814)

next to printQualifiedName?

clangd/CodeComplete.cpp
562 ↗(On Diff #169814)

do you also want to use printScope here?

clangd/Quality.cpp
250 ↗(On Diff #169814)

kepp -> keep

251 ↗(On Diff #169814)

might be clearer to explicitly prepend "/" rather than relying on normalization to do it?

261 ↗(On Diff #169814)

My intuition is that mostly the ranking should look like:

  1. results from queried namespaces
  2. results from nearby namespaces
  3. results from unrelated namespaces

That would require Source.Cost < (UpCost, DownCost), which isn't possible when they're 1.
WDYT?

262 ↗(On Diff #169814)

I thought we concluded down cost was likely to be significantly higher than up cost?

267 ↗(On Diff #169814)

This seems a little focused on the implementation rather than the intent...
maybe "The global namespace is not 'near' its children"?

272 ↗(On Diff #169814)

This seems like an odd rule, what's the intuition?

  • if the preferred scope is a::b::c:: then we're considering a::b:: and a:: to be equal, rather than preferring the former
  • you're defining the cost in terms of UpCost, but it's not clear why - is this just being used as a scale?
  • for the direct parent of the preferred scope, this is a no-op. So this only does anything when the preferred scope is at least 3 deep

As a first approximation, ; // just rely on up-traversals might be OK,
otherwise, I guess you're trying to boost these compared to just relying on up-traversals, so I'd expect this to look something like Cost = UpCost * (len(preferred) - len(S)) / 2

274 ↗(On Diff #169814)

Again, if this just means "parents of preferred are 1, others are 2", I'm not sure that sounds right.
A using-directive is explicit in the user's code, I'm not sure it should rank lower?

Happy enough to try this out as it is though, maybe just drop the references to UpCost.

276 ↗(On Diff #169814)

Why is this +=?

Seems like there's three cases here:

  • global is the preferred scope: I think the cost has to be 0 in this case, no?
  • global is another acceptable scope: some large fixed cost seems appropriate
  • global is not listed: no source to worry about
287 ↗(On Diff #169814)

0.6 seems too high for AnyScope matches, to me. And you're not distinguishing anyScope matches from bad path matches.

I think the linear scale is too flat: if you're in clang::clangd::, then clangd symbols will only get a 13% boost over clang ones.

Given the current UpCost == DownCost == 1, I'd consider something like

if (D == FileDistance::Unreachable)
  return 0.1;
return max(0.5, MaxBoost * pow(0.6, D))
382 ↗(On Diff #169814)

hmm, if you have no scopes in the query I think you're boosting the sema results and not the index ones. Intended?

clangd/Quality.h
83 ↗(On Diff #169814)

This is mechanism, not signals, and makes this header harder to understand.

Can you move the string-translation/wrapping part to FileDistance.h (returning Distance) and the boost part to Quality.cpp?

84 ↗(On Diff #169814)

clang-format

93 ↗(On Diff #169814)

why is this optional? does FileDistance not work with an empty set of roots?
If the idea is to avoid downranking everything if there are no scopes, QueryScopeProximity itself is already optional.

94 ↗(On Diff #169814)

unused

106 ↗(On Diff #169814)

you've now got scope-proximity and file-proximity info interleaved. These are analogous signals, but separate, so probably clearest to group each one.

ioeric edited the summary of this revision. (Show Details)Oct 17 2018, 1:17 AM
ioeric updated this revision to Diff 169962.Oct 17 2018, 1:20 AM
ioeric marked 18 inline comments as done.
  • Addressed review comments.

Thanks for the suggestions! After taking a closer look at boosting factors for other signals, I think I was being too conservative about boosting preferred scopes and penalizing non-preferred ones. I have tuned the parameters to make these more aggressive now according to your suggestion.

clangd/Quality.cpp
262 ↗(On Diff #169814)

Sorry, forgot to make that change. Made UpCost=1, DownCost=2

272 ↗(On Diff #169814)

As a first approximation, ; // just rely on up-traversals might be OK,

This sounds good to me.

276 ↗(On Diff #169814)

global is the preferred scope: I think the cost has to be 0 in this case, no?

I think we should still apply some penalty here because

  1. All projects can define symbols in the global scope. For example, if I work on a project without namespace, symbols from global namespaces are not necessarily relevant.
  2. A use pattern is that using-namespace is used in place of enclosing namespaces in implementation files.

Otherwise, done.

287 ↗(On Diff #169814)

Done except for the scope for unreachable paths.

I think 0.1 is too much penalty and would probably make cross-namespace completions not useful. As cross-namespace completion is not enabled by default, it should be safe to give it a higher score (0.4?).

382 ↗(On Diff #169814)

Not intended. Changed to only boost scopes if there are query scopes.

sammccall accepted this revision.Oct 17 2018, 2:18 AM

Awesome! Just nits.

clangd/CodeComplete.cpp
1224 ↗(On Diff #169962)

, if there are scopes?

clangd/FileDistance.cpp
206 ↗(On Diff #169962)

I think this sets MaxUpTraversals to -1 for the empty scope.
*probably* harmless in the end, but I think an explicit if !S.empty() might be clearer

clangd/FileDistance.h
125 ↗(On Diff #169962)

nit: It seems slightly odd to make this a unique_ptr just to make the constructor simpler - could also just define a helper function in the CC file that the constructor could call: ScopeDistance::ScopeDistance(ArrayRef<string> Scopes) : Distance(createScopeFileDistance(Scopes))

This revision is now accepted and ready to land.Oct 17 2018, 2:18 AM
ioeric updated this revision to Diff 169982.Oct 17 2018, 3:44 AM
ioeric marked an inline comment as done.
  • address comments
ioeric updated this revision to Diff 169983.Oct 17 2018, 3:48 AM
  • rebase
This revision was automatically updated to reflect the committed changes.