This should make all-scope completion more usable. Scope proximity for
indexes will be added in followup patch.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
clangd/CodeComplete.cpp | ||
---|---|---|
558 ↗ | (On Diff #169180) | does this do the right thing if it's anonymous or inline, or a parent is? For indexing, we call a function in AST.h, we should probably do something similar. The catch is that function relies on NamedDecl::printQualifiedName, maybe we need to factor out the relevant part so we can call it on a DeclContext. |
559 ↗ | (On Diff #169180) | shouldn't this be ""? That's a definite scope, not a failure to find one. |
clangd/Quality.cpp | ||
246 ↗ | (On Diff #169180) | If you're going to decide these numbers directly... a) I think you should just return a multiplier here, rather than a 0-1 score e.g. I'd suggest returning 1, 2, 1.5, 1, 1, 1.5, 0.3 or similar |
301 ↗ | (On Diff #169180) | I don't think "always in the accessible scope" is sufficient justification to give a score of 1. |
clangd/Quality.h | ||
92 ↗ | (On Diff #169180) | hmm, can/should we use FileProximity for this, and just transform the strings? This isn't going to cache for symbols sharing a namespace, isn't going to handle "symbol is in a child namespace of an included namespace", and some other combinations. |
clangd/CodeComplete.cpp | ||
---|---|---|
558 ↗ | (On Diff #169180) | Good catch. NamedDecl::printQualifiedName doesn't skip the (anonymous) part if the decl itself is anonymous, even with SuppressUnwrittenScope set. So I think we should explicitly filter those out here and then call printQualifiedName. |
559 ↗ | (On Diff #169180) | Make sense. I think we might still want to special-case the global scope (not here, maybe in proximity scoring) because:
using namespace clang; using namespace clang::clangd; Currently, if the enclosing scope is "", then we would treat it the same as the scopes from using-namespace directives. |
clangd/Quality.cpp | ||
246 ↗ | (On Diff #169180) |
I tried this. Unfortunately, allowing <1 multiplier would make the default value (for sema proximity) tricky. And [0,1] proximity score seems to be less confusing as it's consistent with the file proximity scale. If [0,1] isn't enough, we could still increase the upper bound?
It's ~1/3 only for the global scope vs non-preferred symbols, which seems reasonable to me. I worry penalizing non-preferred scopes too much can make cross-namespace completion less useable. |
clangd/Quality.h | ||
92 ↗ | (On Diff #169180) | It seems to me that it'd be less trivial to associate scopes with up/down traverses. Currently, the query scopes contain all enclosing namespaces, so the distance seems less important here. In general, I think the characteristics of scope proximity (e.g. all enclosing namespaces are already in query scopes) allow us to get away with something more trivial than file proximity?
This seems to be less a concern if we are not calculating up/down distance.
This can be covered by SymbolScope.startswith(QueryScope)? It might not work well for using-namespaces, but it's unclear how important that is. Still happy to consider FileDistance-approach as I might be missing something. |
(BTW I wouldn't expect to see improvements on eval here, as
clangd/CodeComplete.cpp | ||
---|---|---|
558 ↗ | (On Diff #169180) | can you move this function to AST.h? e.g. printScope? It seems like an obvious peer to printQualifiedName. |
clangd/Quality.cpp | ||
246 ↗ | (On Diff #169180) |
There are three types of symbols:
Note the category of results which we have neither sema nor index for doesn't actually exist. The framework allows for it and we should respect that as best we can, but it can't be the top priority. So can't we just change the SemaScopeProximityScore to bool SemaSaysInScope or something and evaluate as SemaSaysInScope ? MaxBoost : QueryScopeProximity ? evaluateUsingQueryScopeProximity() : 1?
Queried scopes (not most-preferred, and not the global scope!) get a boost of 1.6, while completely irrelevant scopes get a boost of 1. This is a relative penalty of only 37.5%. |
clangd/Quality.h | ||
92 ↗ | (On Diff #169180) |
Sure. But there cases like using namespace std and then _1 which should resolve to std::placeholders::_1.
Maybe we can. But we should only be trying to get away with it if it's some combination of {less complex, better, faster}. It doesn't seem clear to me that it's any of these.
Again, less of a concern, but not obviously a non-issue. With this implementation, you compare the scope string of each symbol to every query scope. With FileDistance, you hash the scope string and then almost-always hit the cache.
You've updated the code for this, but now you're not giving nested-within-query-scope any penalty compared to directly-in-query-scope. All of this can be implemented, but I don't really understand why - does reusing the impl not seem simpler? |
clangd/AST.cpp | ||
---|---|---|
77 ↗ | (On Diff #169814) | It would be nice to have the invariant splitQualifiedName(printQualifiedName(D)) == printScope(D.getContext()) for Decl D. That's why I was suggesting splitting NamedDecl::printQualifiedName. Given the limited functionality here it should probably be called printNamespaceScope, lest someone try to reuse it to get a symbol's scope. |
clangd/AST.h | ||
51 ↗ | (On Diff #169814) | next to printQualifiedName? |
clangd/CodeComplete.cpp | ||
562 ↗ | (On Diff #169814) | do you also want to use printScope here? |
clangd/Quality.cpp | ||
250 ↗ | (On Diff #169814) | kepp -> keep |
251 ↗ | (On Diff #169814) | might be clearer to explicitly prepend "/" rather than relying on normalization to do it? |
261 ↗ | (On Diff #169814) | My intuition is that mostly the ranking should look like:
That would require Source.Cost < (UpCost, DownCost), which isn't possible when they're 1. |
262 ↗ | (On Diff #169814) | I thought we concluded down cost was likely to be significantly higher than up cost? |
267 ↗ | (On Diff #169814) | This seems a little focused on the implementation rather than the intent... |
272 ↗ | (On Diff #169814) | This seems like an odd rule, what's the intuition?
As a first approximation, ; // just rely on up-traversals might be OK, |
274 ↗ | (On Diff #169814) | Again, if this just means "parents of preferred are 1, others are 2", I'm not sure that sounds right. Happy enough to try this out as it is though, maybe just drop the references to UpCost. |
276 ↗ | (On Diff #169814) | Why is this +=? Seems like there's three cases here:
|
287 ↗ | (On Diff #169814) | 0.6 seems too high for AnyScope matches, to me. And you're not distinguishing anyScope matches from bad path matches. I think the linear scale is too flat: if you're in clang::clangd::, then clangd symbols will only get a 13% boost over clang ones. Given the current UpCost == DownCost == 1, I'd consider something like if (D == FileDistance::Unreachable) return 0.1; return max(0.5, MaxBoost * pow(0.6, D)) |
382 ↗ | (On Diff #169814) | hmm, if you have no scopes in the query I think you're boosting the sema results and not the index ones. Intended? |
clangd/Quality.h | ||
83 ↗ | (On Diff #169814) | This is mechanism, not signals, and makes this header harder to understand. Can you move the string-translation/wrapping part to FileDistance.h (returning Distance) and the boost part to Quality.cpp? |
84 ↗ | (On Diff #169814) | clang-format |
93 ↗ | (On Diff #169814) | why is this optional? does FileDistance not work with an empty set of roots? |
94 ↗ | (On Diff #169814) | unused |
106 ↗ | (On Diff #169814) | you've now got scope-proximity and file-proximity info interleaved. These are analogous signals, but separate, so probably clearest to group each one. |
Thanks for the suggestions! After taking a closer look at boosting factors for other signals, I think I was being too conservative about boosting preferred scopes and penalizing non-preferred ones. I have tuned the parameters to make these more aggressive now according to your suggestion.
clangd/Quality.cpp | ||
---|---|---|
262 ↗ | (On Diff #169814) | Sorry, forgot to make that change. Made UpCost=1, DownCost=2 |
272 ↗ | (On Diff #169814) |
This sounds good to me. |
276 ↗ | (On Diff #169814) |
I think we should still apply some penalty here because
Otherwise, done. |
287 ↗ | (On Diff #169814) | Done except for the scope for unreachable paths. I think 0.1 is too much penalty and would probably make cross-namespace completions not useful. As cross-namespace completion is not enabled by default, it should be safe to give it a higher score (0.4?). |
382 ↗ | (On Diff #169814) | Not intended. Changed to only boost scopes if there are query scopes. |
Awesome! Just nits.
clangd/CodeComplete.cpp | ||
---|---|---|
1224 ↗ | (On Diff #169962) | , if there are scopes? |
clangd/FileDistance.cpp | ||
206 ↗ | (On Diff #169962) | I think this sets MaxUpTraversals to -1 for the empty scope. |
clangd/FileDistance.h | ||
125 ↗ | (On Diff #169962) | nit: It seems slightly odd to make this a unique_ptr just to make the constructor simpler - could also just define a helper function in the CC file that the constructor could call: ScopeDistance::ScopeDistance(ArrayRef<string> Scopes) : Distance(createScopeFileDistance(Scopes)) |