This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Enable partial namespace matches for workspace symbols
ClosedPublic

Authored by kadircet on Oct 5 2020, 2:56 AM.

Details

Summary

This will enable queries like "clangd::" to find symbols under clangd
namespace, without requiring full "clang::clangd::" qualification.

Since Fuzzyfind performs the search under all scopes and only boosts the symbols
from relevant namespaces, we might get symbols from non-matching namespaces.
This patch chooses to drop those as they clearly do not match the query.

Fixes https://github.com/clangd/clangd/issues/550.

Diff Detail

Event Timeline

kadircet created this revision.Oct 5 2020, 2:56 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 5 2020, 2:56 AM
kadircet requested review of this revision.Oct 5 2020, 2:56 AM
sammccall added inline comments.Oct 5 2020, 7:57 AM
clang-tools-extra/clangd/FindSymbols.cpp
97

Add a comment here (or elsewhere, I guess) about how qualified names are handled.

  • exact namespace: boosted on the index side
  • approximate namespace (e.g. substring): included using "all scopes" logic
  • non-matching namespace (no subtring match): filtered out from index-provided results
106

This expression doesn't make sense (except in context of the above code). Variables are cheap!

LeadingColons = consume_front("::");
Req.AnyScope = !LeadingColons
if (LeadingColons || !Names.first.empty())
  ...
110–115

Given the dynamic filter, we should request a greater multiple here (this time if anyscope && Names.first.empty is the right logic!)

This gives us a second class of regression :-) but we can tune the multiple to control it. I'd suggest 5 or so

122

nit: fuzzyfind

124

Pull out a approximateScopeMatch(scope, query) or so function?

Substring isn't quite right - fine for now with a fixme, but might as well pull out the function now so we don't make a mess when the code grows.

136

Would be nice to incorporate exact vs approximate scope match here: people complain a lot when an exact string match ranks below an approximate one.

I don't think ScopeDistance works as-is, because it assumes the reference point (query) is absolute.
You could consider NeedsFixIts or applying a multiplier to NameMatch (0.3?) or InBaseClass (all of these are a stretch I guess)

nridge added a subscriber: nridge.Oct 5 2020, 8:22 AM
kadircet updated this revision to Diff 296402.Oct 6 2020, 3:31 AM
kadircet marked 6 inline comments as done.
  • Perform sub-scope matching rather than substring match for partial namespaces.
  • Apply a penalty for partially matching namespaces.
sammccall accepted this revision.Oct 6 2020, 4:07 AM
sammccall added inline comments.
clang-tools-extra/clangd/FindSymbols.cpp
45

I had a little trouble following this...
It seems a little simpler (fewer vars to track) if we avoid the up-front split on scopes.

assert(Scope.empty() || Scope.endswith("::")); // or handle in some way
// Walk through Scope, consuming matching tokens from Query.
StringRef First;
while (!Scope.empty() && !Query.empty()) {
  tie(First, Scope) = Scope.split("::");
  if (Query.front() == First)
    Query = Query.drop_front();
}
return Query.empty(); // all components of query matched

in fact we can avoid preprocessing query too:

// Query is just a StringRef
assert(Scope.empty() || Scope.endswith("::")); // or handle in some way
assert(Query.empty() || Query.endswith("::"));

// Walk through Scope, consuming matching tokens from Query.
StringRef First;
while (!Scope.empty() && !Query.empty()) {
  tie(First, Scope) = Scope.split("::");
  Query.consume_front(StringRef(First.data(), First.size() + 2) /*Including ::*/);
}
return Query.empty(); // all components of query matched
clang-tools-extra/clangd/unittests/FindSymbolsTests.cpp
274

Partil -> Partial

This revision is now accepted and ready to land.Oct 6 2020, 4:07 AM
kadircet added inline comments.Oct 6 2020, 9:54 AM
clang-tools-extra/clangd/FindSymbols.cpp
45

Yes but they would do different things. I believe the confusion is caused by usage of sub-scope without a clear definition. The codes you've suggested are performing sub-sequence matches rather than sub-string(i.e. we are looking for a contigious segment in Scope that matches Query).

I believe a query of the form a::c:: shouldn't be matched by a::b::c::. I can try simplifying the logic, but it would be nice to agree on the behaviour :D.

Sorry if I miscommunicated this so far.

sammccall added inline comments.Oct 7 2020, 10:23 AM
clang-tools-extra/clangd/FindSymbols.cpp
45

Ah right, I was indeed misreading the code. Let's have some definitions...

given query a::b::Foo with scope a::b::

a::b::W::a::b::a::X::b::a::b::Y
exact*
prefix**
suffix**
substring***
subsequence****

These support correcting different types of "errors":

  • exact: none
  • prefix: may omit namespaces immediately before Foo
  • suffix: query may be rooted anywhere (other than global ns)
  • substring: query rooted anywhere, omit namespaces before Foo
  • subsequence: may omit any component

We know "exact" is too strict.

I think "prefix" and by extension "substring" aren't particularly compelling rules as the "immediately before Foo" requirement is arbitrary.
Why does a::b::Foo match a::b::c::Foo and not a::c::b::Foo? In each case we've omitted a namespace inside the query, the only difference is what it's sandwiched between.

Suffix makes intuitive sense, it accepts strings that make sense *somewhere* in the codebase.
Subsequence makes intuitive sense too: you're allowed to forget uninteresting components, similar to how fuzzy-match lets you omit uninteresting words.

I'd prefer one of those and don't really mind which - I'd assumed subsequence was intended. Suffix is way easier to implement though :-)

kadircet updated this revision to Diff 296992.Oct 8 2020, 9:19 AM
kadircet marked 3 inline comments as done.
  • Switch from substring to subscope matching.
clang-tools-extra/clangd/FindSymbols.cpp
45

Thanks for the nice table :D

I was mainly shying away from subsequence as I thought it would be too loose, but I suppose it is okay thinking about the fuzzyfind logic. So I've changed the implementation to be that way, while adding some tests.

sammccall accepted this revision.Oct 9 2020, 1:17 AM

Yeah, we can fairly easily make this tighter if it's noisy. My suspicion is people will rather complain about ranking than these results being totally irrelevant.
Nice that we now have workspace/symbols users that can send this feedback!

clang-tools-extra/clangd/FindSymbols.cpp
47

there's no need for these to be special cases, the loop handles them correctly

It simplifies the asserts slightly, but seems like false economy

kadircet updated this revision to Diff 297142.Oct 9 2020, 1:21 AM
kadircet marked an inline comment as done.
  • Move if checks into asserts.
This revision was landed with ongoing or failed builds.Oct 9 2020, 1:21 AM
This revision was automatically updated to reflect the committed changes.