This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Implementation of workspace/symbol request
ClosedPublic

Authored by malaperle on Mar 25 2018, 2:39 PM.

Details

Summary

This is a basic implementation of the "workspace/symbol" request which is
used to find symbols by a string query. Since this is similar to code completion
in terms of result, this implementation reuses the "fuzzyFind" in order to get
matches. For now, the scoring algorithm is the same as code completion and
improvements could be done in the future.

The index model doesn't contain quite enough symbols for this to cover
common symbols like methods, enum class enumerators, functions in unamed
namespaces, etc. The index model will be augmented separately to achieve this.

Diff Detail

Event Timeline

malaperle created this revision.Mar 25 2018, 2:39 PM
ilya-biryukov added a subscriber: sammccall.

Adding @sammccall, he will definitely want to take a look at index-related changes.
On a high level, the approach seems just right.
A few questions immedieately that came to mind:

  • Unconditionally storing much more symbols in the index might have subtle performance implications, especially for our internal use (the codebase is huge). I bet that internally we wouldn't want to store the symbols not needed for completion, so we'll probably need a switch to disable storing them in the indexing implementation. But let's wait for Sam to take a look, he certainly has a better perspective on the issues there.
  • Current fuzzyFind implementation can only match qualifiers strictly (i.e. st::vector won't match std::vector). Should we look into allowing fuzzy matches for this feature? (not in this patch, rather in the long term).
  • Have you considered also allowing '.' and ' ' (space) as separators in the request? Having additional separators doesn't really hurt complexity of the implementation, but allows to switch between tools for different languages easier.

E.g., std.vector.push_back and std vector push_back could produce the same matches as std::vector::push_back.

clangd/ClangdServer.h
162

NIT: use Callback typedef.

clangd/WorkspaceSymbols.cpp
165 ↗(On Diff #139749)

We have FuzzyMatcher, it can produce decent match scores and is already used in completion.
Any reason not to use it here?

clangd/index/Index.h
141 ↗(On Diff #139749)

NIT: remove empty lines to be consistent with the rest of the struct.

  • Unconditionally storing much more symbols in the index might have subtle performance implications, especially for our internal use (the codebase is huge). I bet that internally we wouldn't want to store the symbols not needed for completion, so we'll probably need a switch to disable storing them in the indexing implementation. But let's wait for Sam to take a look, he certainly has a better perspective on the issues there.

I'm a bit surprised that internally you do not want symbols beyond the ones for completion. We have a lot more features in mind that will make the index much bigger, like adding all occurrences (maybe not in the current YAML though). But having options to control the amount of information indexed sounds good to me as there can be a lot of different project sizes and there can be different tradeoffs. I had in mind to an option for including/excluding the occurrences because without them, you lose workspace/references, call hierarchy, etc, but you still have code completion and workspace/symbol, document/symbol, etc while making the index much smaller. But more options sounds good.

  • Current fuzzyFind implementation can only match qualifiers strictly (i.e. st::vector won't match std::vector). Should we look into allowing fuzzy matches for this feature? (not in this patch, rather in the long term).

I'm not sure, I'm thinking there should be a balance between how fuzzy it it and how much noise it generates. Right now I find it a bit too fuzzy since when I type "Draft" (to find DraftMgr), it picks up things like DocumentRangeFormattingParams. Adding fuzziness to the namespace would make this worse. Maybe with improved scoring it won't matter too much? I'll try FuzzyMatcher and see.

  • Have you considered also allowing '.' and ' ' (space) as separators in the request? Having additional separators doesn't really hurt complexity of the implementation, but allows to switch between tools for different languages easier.

E.g., std.vector.push_back and std vector push_back could produce the same matches as std::vector::push_back.

I haven't thought of that and I think it's a good idea. Sounds like a good, isolated thing to do in a separate patch.

malaperle added inline comments.Mar 26 2018, 7:17 AM
clangd/WorkspaceSymbols.cpp
165 ↗(On Diff #139749)

I think it was giving odd ordering but let me try this again and at least document the reason.

Very nice! I'd like to reduce the scope of the initial patch, which seems to be possible, so we can review the details but not get bogged down too much.

  • Unconditionally storing much more symbols in the index might have subtle performance implications, especially for our internal use (the codebase is huge). I bet that internally we wouldn't want to store the symbols not needed for completion, so we'll probably need a switch to disable storing them in the indexing implementation. But let's wait for Sam to take a look, he certainly has a better perspective on the issues there.

I'm a bit surprised that internally you do not want symbols beyond the ones for completion.

FWIW, I think we do, it just needs to be done carefully (with the right semantic filters at query time, and metadata at indexing time).
Can we split this patch up (it's pretty large anyway) and land the workspaceSymbols implementation first, without changing the indexer behavior?

I think the indexer changes are in the right direction, but I think "ForCompletion" isn't quite enough metadata, and don't want to block everything on that.

We have a lot more features in mind that will make the index much bigger, like adding all occurrences (maybe not in the current YAML though).

(tangential)
FWIW, I think *that* might be a case where we want quite a different API - I think this is a file-major index operation (handful of results per file) rather than a symbol-major one (results per symbol is essentially unbounded, e.g. std::string). At the scale of google's internal codebase, this means we need a different backend, and this may be the case for large open-source codebases too (I know chromium hits scaling issues with indexers they try).
There are also other operations where results are files rather than symbols: which files include file x, etc.

But having options to control the amount of information indexed sounds good to me as there can be a lot of different project sizes and there can be different tradeoffs. I had in mind to an option for including/excluding the occurrences because without them, you lose workspace/references, call hierarchy, etc, but you still have code completion and workspace/symbol, document/symbol, etc while making the index much smaller. But more options sounds good.

If we add options, someone has to actually set them.
Anything that's too big for google's internal index or similar can be filtered out in the wrapping code (mapreduce) if the indexer exposes enough information to do so.
We shouldn't have problems with the in-memory dynamic index.
Features that are off in the default static indexer configuration won't end up getting used much.

So I can see a use for filters where the command-line indexer (YAML) would run too slow otherwise, but a) let's cross that bridge when we come to it and b) there's lots of low-hanging fruit there - the YAML format itself should be considered a placeholder.

  • Current fuzzyFind implementation can only match qualifiers strictly (i.e. st::vector won't match std::vector). Should we look into allowing fuzzy matches for this feature? (not in this patch, rather in the long term).

I'm not sure, I'm thinking there should be a balance between how fuzzy it it and how much noise it generates. Right now I find it a bit too fuzzy since when I type "Draft" (to find DraftMgr), it picks up things like DocumentRangeFormattingParams. Adding fuzziness to the namespace would make this worse. Maybe with improved scoring it won't matter too much? I'll try FuzzyMatcher and see.

+1, I think experience with workspaceSymbols will help us answer this question.

clangd/ClangdLSPServer.cpp
107

This is the analogue to the one on completion that we currently ignore, and one on symbol corresponding to the documentSymbol call which isn't implemented yet.

I think the value of the extended types is low and it might be worth leaving out of the first version to keep complexity down.
(If we really want it, the supporting code (mapping tables) should probably be in a place where it can be reused by documentSymbol and can live next to the equivalent for completion...)

clangd/ClangdLSPServer.h
94

This comment is just the var name, delete?

95

actually, I'm not sure this is worth tracking at all.
Dynamic index is on by default, and the worst that can happen is some dynamically-configured clients will show a menu option in a non-default configuration, but it won't actually work.
I don't think fixing this is worth the layering violation

clangd/ClangdServer.h
160

onWorkspaceSymbol -> workspaceSymbols

clangd/WorkspaceSymbols.cpp
1 ↗(On Diff #139749)

as noted elsewhere, adjustKindToCapability etc need to be sharable, so I'm not sure this file structure quite works - it's named too narrowly to have any shared functionality, but if you pull those functions out there's only one left.

Maybe we could rename this FindSymbols.cpp, and then documentSymbols will also live here?

165 ↗(On Diff #139749)

I think it's going to be worth pulling out a scoring concept/function here, we can't reuse the priority from the sema codecompletion, but the server's going to provide useful scoring signals.
I'm not sure how much we want to keep this in sync with code completion overall, though.

clangd/WorkspaceSymbols.h
37 ↗(On Diff #139749)

so reading all the files we return from the index is obviously going to be bad, we screwed up the interfaces :-)

From discussing with people who've dealt with similar things, we're probably going to need to store both row/col and offset in the index. I'll try to fix this soon...
What you're doing for now is fine, but should probably have loud fixmes on it.

clangd/index/Index.h
143 ↗(On Diff #139749)

This strikes me as conflating a bunch of ideas that are independently important (e.g. is the symbol top-level, is it visible outside the declaring file). We should design this carefully after considering LSP features that might use the index.

clangd/tool/ClangdMain.cpp
105

the -completion-limit was mostly to control rollout, I'm not sure this needs to be a flag. If it does, can we make it the same flag as completions (and call it -limit or so?)

122

if we're renaming this, just -index?

  • Current fuzzyFind implementation can only match qualifiers strictly (i.e. st::vector won't match std::vector). Should we look into allowing fuzzy matches for this feature? (not in this patch, rather in the long term).

I'm not sure, I'm thinking there should be a balance between how fuzzy it it and how much noise it generates. Right now I find it a bit too fuzzy since when I type "Draft" (to find DraftMgr), it picks up things like DocumentRangeFormattingParams. Adding fuzziness to the namespace would make this worse. Maybe with improved scoring it won't matter too much? I'll try FuzzyMatcher and see.

+1, I think experience with workspaceSymbols will help us answer this question.

I was using an IDE that had fuzzy find for an equivalent of workspaceSymbols and found it to be an amazing experience. And having consistent behavior between different features is really nice.
Good ranking is a key to it being useful, though. If when typing Draft you get DocumentRangeFormattingParams ranked higher than DraftMgr that's a bug in FuzzyMatcher. If you have some not-very-nice results at the end of the list, this shouldn't be a problem in most cases.

I'm highly in favor of enabling fuzzy matching for workspaceSymbols.

I'm highly in favor of enabling fuzzy matching for workspaceSymbols.

At lest for the name themselves. Non-fuzzy matching of qualifiers does not look that important.

malaperle marked 9 inline comments as done.Mar 27 2018, 12:02 PM

Can we split this patch up (it's pretty large anyway) and land the workspaceSymbols implementation first, without changing the indexer behavior?

I think the indexer changes are in the right direction, but I think "ForCompletion" isn't quite enough metadata, and don't want to block everything on that.

Sounds good. ForCompletion was more of a short term hack in my mind, we can try to split it and make proper metadata.

We have a lot more features in mind that will make the index much bigger, like adding all occurrences (maybe not in the current YAML though).

(tangential)
FWIW, I think *that* might be a case where we want quite a different API - I think this is a file-major index operation (handful of results per file) rather than a symbol-major one (results per symbol is essentially unbounded, e.g. std::string). At the scale of google's internal codebase, this means we need a different backend, and this may be the case for large open-source codebases too (I know chromium hits scaling issues with indexers they try).
There are also other operations where results are files rather than symbols: which files include file x, etc.

Yeah, there needs to be an API to fetch things per file, such as the dependencies and the occurrences it contains (for updating when the file is deleted, etc). I'm not sure there needs to be an API that queries a file for a given USR though, I think that could be transparent to the caller and an implementation detail. The results could be grouped by file though. Good discussion for future patches :)

But having options to control the amount of information indexed sounds good to me as there can be a lot of different project sizes and there can be different tradeoffs. I had in mind to an option for including/excluding the occurrences because without them, you lose workspace/references, call hierarchy, etc, but you still have code completion and workspace/symbol, document/symbol, etc while making the index much smaller. But more options sounds good.

If we add options, someone has to actually set them.
Anything that's too big for google's internal index or similar can be filtered out in the wrapping code (mapreduce) if the indexer exposes enough information to do so.
We shouldn't have problems with the in-memory dynamic index.
Features that are off in the default static indexer configuration won't end up getting used much.

I think everything should be ON by default but we can run into issues that the user wants to reduce index size and help indexing speed. We had such options in CDT since there are quite a few things that can make it big (macro expansion steps, all references, etc). But this can be added as needed later.

So I can see a use for filters where the command-line indexer (YAML) would run too slow otherwise, but a) let's cross that bridge when we come to it

Sounds good.

  • Current fuzzyFind implementation can only match qualifiers strictly (i.e. st::vector won't match std::vector). Should we look into allowing fuzzy matches for this feature? (not in this patch, rather in the long term).

I'm not sure, I'm thinking there should be a balance between how fuzzy it it and how much noise it generates. Right now I find it a bit too fuzzy since when I type "Draft" (to find DraftMgr), it picks up things like DocumentRangeFormattingParams. Adding fuzziness to the namespace would make this worse. Maybe with improved scoring it won't matter too much? I'll try FuzzyMatcher and see.

+1, I think experience with workspaceSymbols will help us answer this question.

I was using an IDE that had fuzzy find for an equivalent of workspaceSymbols and found it to be an amazing experience. And having consistent behavior between different features is really nice.
Good ranking is a key to it being useful, though. If when typing Draft you get DocumentRangeFormattingParams ranked higher than DraftMgr that's a bug in FuzzyMatcher. If you have some not-very-nice results at the end of the list, this shouldn't be a problem in most cases.

I'm highly in favor of enabling fuzzy matching for workspaceSymbols.

I think it will be fine once we address the bug(s). As long as the exact matches are always first then there's no problem.

clangd/ClangdLSPServer.cpp
107

I think having Struct and EnumMember is nice and for sure once Macros is there we will want to use it. So would it be OK to move to FindSymbols.cpp (the new common file for document/symbol and workspace/symbol)?

clangd/ClangdLSPServer.h
95

I wonder if the "enable-index-based" is even necessary...

clangd/WorkspaceSymbols.cpp
1 ↗(On Diff #139749)

FindSymbols.cpp sounds good, I was thinking of moving things around once I introduce document/symbols but we might as well name this appropriately in the first place.

165 ↗(On Diff #139749)

I think I'm going to remove the sort here and try to investigate the FuzzyMatcher separately. It seems to work correctly for code completion but for workspace symbols there are many more results and it looks like the scoring needs to be tweaked.

clangd/WorkspaceSymbols.h
37 ↗(On Diff #139749)

I added a FIXME in the implementation. You should probably chime in https://reviews.llvm.org/D39050, about storing more things in the index to not read all files :-)

clangd/index/Index.h
143 ↗(On Diff #139749)

Agreed. I was hesitant between adding a bunch of additional information/fields or adding just one for simplicity and revisit when we need more features. But it would be best if this could be well defined from the get go and was hoping for some suggestions on how to go about this. Doing this in a separate patch makes a lot of sense to me.

clangd/tool/ClangdMain.cpp
105

I think it's quite similar to "completions", when you type just one letter for example, you can get a lot of results and a lot of JSON output. So it feels like the flag could apply to both completions and workspace symbols. How about -limit-resuts? I think just -limit might be a bit too general as we might want to limit other things.

malaperle marked 4 inline comments as done.

Split the patch so that this part only has the workspace/symbol part
and the index model will be updated separately.

malaperle edited the summary of this revision. (Show Details)Mar 27 2018, 1:08 PM

Have you considered also allowing '.' and ' ' (space) as separators in the request? Having additional separators doesn't really hurt complexity of the implementation, but allows to switch between tools for different languages easier.

I would suggest allowing patterns containing space to match text without space, e.g. pattern a b can match text aB. The initial character of each word in the pattern should be seen as a Head position. This behavior matches many fuzzy matching plugins used in Emacs and Vim.

clangd/SourceCode.cpp
108

May I ask a question about the conversion between SourceLocation and LSP location? When the document is slightly out of sync with the indexed version, what will be returned?

MaskRay added inline comments.Mar 27 2018, 1:31 PM
clangd/tool/ClangdMain.cpp
105

Can these options be set by LSP initialization options?

The index changes are moved here: https://reviews.llvm.org/D44954 I haven't changed the patch yet though, I just removed it from this one.

malaperle added inline comments.Mar 28 2018, 2:00 PM
clangd/SourceCode.cpp
108

I forgot to cover the case of the unsaved files, which are indexed in memory. It that what you mean? I'll update the patch to address by using the DraftStore when available. There is also the case where the file is not opened but outdated on disk. I don't think we can do much about it but make sure it doesn't crash :) At worst, the location might be off, and navigation will be momentarily affected, until the index can be updated with the file change. This is what I've noticed in other IDEs as well.

clangd/tool/ClangdMain.cpp
105

They could. Are you say they *should*? We could add it in DidChangeConfigurationParams/ClangdConfigurationParamsChange (workspace/didChangeConfiguration) if we need to. I haven't tried or needed to add it on the client side though. It's not 100% clear what should go in workspace/didChangeConfiguration but I think it should probably things that the user would change more often at run-time. I'm not sure how frequent this "limit" will be changed by the user.

malaperle updated this revision to Diff 140272.Mar 29 2018, 9:18 AM

Use the DraftStore for offset -> line/col when there is a draft available.

I realized that using the draft store has limited improvement for now because not many symbol locations come from main files (.cpp, etc) and when headers are modified, the main files are not reindexed right now. So the draft store will give better location for example when navigating to a function definition which moved in the unsaved main file. But it will be more general once header updates are handled better.

MaskRay added inline comments.Mar 29 2018, 9:30 AM
clangd/FindSymbols.cpp
32

This std::find loop adds some overhead to each candidate... In my experience the client usually doesn't care about the returned symbol kinds, they are used to give a category name. You can always patch the upstream to add missing categories.

This is one instance where LSP is over specified. nvm I don't have strong opinion here

clangd/Protocol.cpp
206

This copy-pasting exposes another problem that the current fromJSON toJSON approach is too verbose and you may find other space-efficient serialization format convenient .... Anyway, they can always be improved in the future.

clangd/tool/ClangdMain.cpp
235

I know command line options are easy for testing purpose but they are clumsy for users. You need a "initialization option" <-> "command line option" bridge.

This is fantastic, really looking forward to using it.
I'm going to need to give up on vim at this rate, or do some serious work on ycm.

Most significant comments are around the new functions in SourceLocation, and whether we can keep the type-mapping restricted to the LSP layer.

clangd/ClangdLSPServer.cpp
107

I think having Struct and EnumMember is nice and for sure once Macros is there we will want to use it.

OK, fair enough.

I think what bothers me about this option is how deep it goes - this isn't really a request option in any useful sense.
I'd suggest the "C++ layer" i.e. ClangdServer should always return full-fidelity results, and the "LSP layer" should deal with folding them.

In which case adjustKindToCapability should live somewhere like protocol.h and get called from this file - WDYT?

clangd/ClangdServer.h
160

nit: add a comment for this API, e.g. /// Retrieve the top symbols from the workspace matching a query

clangd/FindSymbols.cpp
29

Can we rather have this condition checked when the client sets the supported symbols, and remove this special knowledge here?

33

this seems gratuituously inefficient - I guess it doesn't matter that much, but can't we store the supported kinds as a std::bitset or so?

37

nit: if we think this will be extended, a switch might be clearer

43

This code shouldn't need to handle this case. The LSP specifies the default set of supported types if it's not provided, so ClangdLSPServer should enure we always have a valid set

115

why Query.empty()?
In practice maybe showing results before you type is bad UX, but that seems up to the editor to decide.

118

nit: move this below the next "paragraph" where it's used?

131

nit: this is Names.first.consume_front("::")
Might be clearer as "FuzzyFind doesn't want leading :: qualifier" to avoid implication, that global namespace is special, particularly because...

131

actually the global namespace *is* special :-)

IMO If you type Foo, you should get results from any namespace, but if you type ::Foo you should only get results from the global namespace.

So something like:

if (Names.first.consume_front("::") || !Names.first.empty())
  Req.Scopes = {Names.first};

though more explicit handling of the cases may be clearer

142

Ugh, I never know what the right thing is here, decl/def split is a mess, and there's lots of different workflows.

I don't have any problem with the choice made here (personally I'd probably prefer canonical decl if we had working gotodefn), but it's subtle, so I think it should be a bit louder :-)
Maybe just a comment // Prefer the definition over e.g. a function declaration in a header - that's the most important case that differs.

If we see this decision popping up again maybe we should put it as a function on Symbol to call out/codify this decision?
e.g. SymbolLocation Symbol::NavigationLocation() { return Definition ? Definition : CanonicalDeclaration; }
but up to you' that's probably premature

157

nit: move into if?

160

maybe add an assertion that the adjusted SK is supported? (This should be easier if the supportedSymbolKinds is always provided)

163

The most obvious reading of the spec to me, and the one that matches the language servers I found, is that the containerName should be "foo::bar", not "foo::bar::"

clangd/Protocol.cpp
194

there's two right ways to do this which are conflated a bit here:

  • make the field of type T, give it a meaningful default value, and map it optionally (ignore return value of map())
  • make the field of type Optional<T>, let the default be none, and require map() to succeed (which it will if the field is missing)

So I think you want O && O.map("valueSet", R.valueSet), and below. This has the advantage that if valueSet is present in the message but has an invalid value, parsing will fail.

(Yes, this is confusing and not at all ideal, sorry...)

clangd/SourceCode.cpp
79

This function seems to be biting off more than it can chew - it's really hard to abstract this away and its callsites don't seem to fit the same pattern.

I'd probably suggest moving it back and having FindSymbols do just the part it needs locally.

First, SourceRange doesn't have a consistent set of semantics stronger than "a pair of sourcelocations". The first element always indicates the first point in the range, but the second element can be any of:

  • the first character outside the range (you're assuming this, which I think is actually the rarest in clang)
  • the last character inside the range
  • the first character of the last token inside the range (this is the least useful and I think most common!)

Secondly, there's several ways to interpret a location and the choice is somewhat arbitrary here. (in fact the choice isn't consistent even within the function - you may take the spelling location sometimes and the expansion location other times).

Third, the failure mechanism is opaque - this might fail and we don't describe how, so callsites are going to depend on the implementation details, and possibly end up doing redundant fallbacks.

Maybe most importantly, in the long run this is doing complicated and redundant work to push workspaceSymbols through this function:

  • constructing a path from SourceLocation is useful to xrefs, but documentHighlights had the path in the first place!
  • converting offset->row/column for workspaceSymbols is more simply done without SourceManager (VFS + offsetToPosition) and will go away entirely when we fix the index.
clangd/index/SymbolCollector.cpp
320

the message just repeats the assert expression, remove?
(I think this would be slightly clearer if it was just below OS.flush)

clangd/tool/ClangdMain.cpp
105

Static configuration that's available at construction time is much easier to work with, it follows the lifetime of the C++ objects. The LSP-initialization is an annoying fact of life, but not too bad since no "real work" needs to happen first.

If a parameter actually needs to be dynamic, we should strive to specify it at request-time, that's also a simple logical model. If we're going to have to extend LSP for this stuff anyway, we might as well extend the requests.

The didChangeConfiguration is very difficult to work with in my view, and we should avoid adding more options there at all possible.

Aside: several times I've attempted to clean up the GlobalCompilationDatabase and run up against this requirement that breaks most attempts at abstraction. (I'm curious why just restarting clangd when the compilation database is overridden doesn't work). At some point as we add the abstractions needed to track changes on disk and dependent rebuilds, we'll get a chance to revisit how this works.

unittests/clangd/FindSymbolsTests.cpp
21

nit: please move to operator<< declared in protocol.h

23

(if stripping :: from containername, don't forget to update here)

48

(nit: why are all these on the heap?)

54

nit: it's rare to need SetUp, the constructor is fine (and consistent with other tests)

96

these negative tests spell the exact name of the symbol and assert the results are empty.
They'll pass if the name is spelled/formatted wrong - maybe prefer getSymbols("l") or getSymbols("")?

100

combine with NoLocals?

121

Can we hold off on these tests until the functionality is in? (though do include *something* in a namespace)

180

maybe add more namespace cases to this:

namespace ans1 { int ai1; namespace ans2 { int ai2 } }

"a" -> ans1, ans1::i1, ans1::ans2, ans1::ans2::i2
"::a" -> ans1
"ans1::" -> ans1::ans2, ans1::ai1
"::ans1::ans2::" -> ans1::ans2::ai2

210

can you include a symbol inside a namespace, and verify it doesn't get returned? (or combine with other namespaces queries as suggested above)

246

the enums/union/inlinenamespace cases seem to be basically testing symbolcollector again - the only differences here are on the indexing side. So I think they're ok to drop.

malaperle added inline comments.Mar 29 2018, 11:00 AM
clangd/FindSymbols.cpp
32

I have a client that throws an exception when the symbolkind is not known and the whole request fails, so I think it's worth checking. But if it's too slow I can look at making it faster. Unfortunately, I cannot patch any upstream project :)

MaskRay added inline comments.Mar 30 2018, 9:49 AM
clangd/FindSymbols.cpp
32

https://github.com/gluon-lang/languageserver-types/blob/master/src/lib.rs#L2016

LanguageClient-neovim returns empty candidate list if one of the candidates has unknown SymbolKind. Apparently they should be more tolerant and there is an issue tracking it.

MaskRay added inline comments.Mar 30 2018, 9:53 AM
clangd/FindSymbols.cpp
32

If it was not an internal confidential client, I would like to know its name, unless the confidentiality includes the existence of the client.

malaperle marked an inline comment as done.Apr 1 2018, 8:29 PM
malaperle added inline comments.
clangd/FindSymbols.cpp
32

Sorry, I should have explained a bit more my thought. I see that an older-ish version of Monaco is not handling unknown symbols kinds well. I don't really know if it's our integration with Monaco or Monaco itself (the IDE is Theia). But my thought is that there are clients out there that follow the protocol "correctly" by choosing not to handle those unknown symbol kinds gracefully. I am not that concerned about a single client in particular, just general support of clients out there. As a server, I think it makes sense to support clients that follow the protocol, especially since 1.x and 2.x are not that old. I don't think it's the crux of the complexity of this patch.

hokein added a subscriber: hokein.Apr 3 2018, 6:51 AM
hokein added inline comments.Apr 3 2018, 8:12 AM
clangd/ClangdLSPServer.cpp
270

InvalidParams doesn't match all cases where internal errors occur. Maybe use ErrorCode::InternalError?

clangd/FindSymbols.cpp
146

I think we may want to report the error to client instead of just logging them.

clangd/SourceCode.cpp
140

nit: this comment is duplicated with the one in header.

clangd/SourceCode.h
58

We should use llvm::Expected?

The function needs a comment documenting its behavior, especially on the unsaved file content.

60

nit: name FilePath or FileName, File seems to be a bit confusing, does it mean FileContent or FileName?

malaperle updated this revision to Diff 141028.Apr 4 2018, 12:56 PM
malaperle marked 27 inline comments as done.

Address comments.

clangd/ClangdLSPServer.cpp
107

Sounds good, I moved it.

270

Sounds good.

clangd/FindSymbols.cpp
29

"supportedSymbolKinds" is now populated with those base symbol kinds at initialization.

43

My thought is that if we start dealing with one of those:

Object = 19,
Key = 20,
Null = 21,
Event = 24,
Operator = 25,
TypeParameter = 26

and it's an older client that only supports up to "Array = 18", then we can provide a fallback, otherwise the client my not handle the unknown kind gracefully. But we can add this later if necessary I guess.

115

vscode sends an empty query and it also doesn't show all symbols in Typescript for example, so that's why I thought that would be best.

131

Sounds good. I think this works well.

142

I think perhaps this could be a user option and making that decision (reading the option and returning the decl) would make sense in a function. Perhaps this can be added a bit later.

146

I'm not sure, wouldn't that mean aborting the whole request? I think by logging only, if one symbol has a problem and not the others, the request could still provide a lot of good results.

160

I added an assertion in ClangdLSPServer, where adjustKindToCapability is now called.

163

Sounds good. I had to create a temp std::string *and* StringRef, yuck.

clangd/SourceCode.cpp
79

This function seems to be biting off more than it can chew - it's really hard to abstract this away and its callsites don't seem to fit the same pattern.
I'd probably suggest moving it back and having FindSymbols do just the part it needs locally.

Sure, I'll move it back/ make a local one. In the end the one in xrefs will be quite similar, especially once we add the proper "go to definition" (soon!) but maybe it's less confusing to duplicate some code here.

you may take the spelling location sometimes and the expansion location other times

What do you mean here? I see spelling but not expansion, maybe I missed it.

converting offset->row/column for workspaceSymbols is more simply done without SourceManager (VFS + offsetToPosition) and will go away entirely when we fix the index.

More simply but I'm not 100% sure it's better. I would be good to somehow reuse the logic in SourceManager's ComputeLineNumbers as it seems to be faster (SSE?) and handles more cases (\r?).
If I were to change the code to use VFS, how would I go about it? I would use FileSystem::getBufferForFile but to prevent reloading the same buffer over and over, then I would need to keep a map of filenames -> buffers, which pretty much FileManager/SourceManager provides already.

clangd/SourceCode.h
58

I removed this one.

60

removed

clangd/index/SymbolCollector.cpp
320

The assert was just moved from splitQualifiedName but I can remove the message and move it.

clangd/tool/ClangdMain.cpp
105

(I'm curious why just restarting clangd when the compilation database is overridden doesn't work).

Restarting Clangd would mean that the client has the burden to know that it has to restart Clangd and "reopen" all the files, which I don't think is specified in the LSP. I think the client would be free to just not restart a "crashed" server. VSCode does restart a crashing server (up to 5 times?) but that's arbitrary I think.
We also want to soon look at handling the CDB changing through the "workspace/didChangeWatchedFiles" and track which flags for which files actually changes, which means some files would be reparsed and others not.

unittests/clangd/FindSymbolsTests.cpp
48

I needed to set ServerOpts.BuildDynamicSymbolIndex = true; before creating the ClangServer. But I extracted this to a test-file-specific optsForTest() and used that instead. Much nicer.

246

My reasoning was that this was also making sure that getWorkspaceSymbols worked for those cases, regardless of how it was implemented, i.e., being dependent on SymbolCollector. It would be possible that getWorkspaceSymbols breaks one of these cases, although unlikely given its *current* implementation. I think a good blend of tests is both some tests that take into consideration the implementation and also tests that do not consider the implementation but rather expected functionality. I guess in that sense, I was trying to add a bit more black box testing of functionality.

malaperle added inline comments.Apr 4 2018, 1:36 PM
clangd/ClangdLSPServer.cpp
109

I'd like to add some test to exercise the changes in ClangdLSPServer.cpp and Protocol.cpp but it's not straightforward to do nicely. Using a "lit" test, I cannot have a header and since symbols in the main file are not collected then it's not useful. If I make a gtest, I have to feed it the lsp server with stdin and check the stdout which is a bit messy, but doable.

sammccall accepted this revision.Apr 11 2018, 6:01 AM

Sorry about the delay Marc-André - I got stuck on "how to test ClangdLSPServer" and then distracted!
But this looks great.

clangd/ClangdLSPServer.cpp
109

Yeah, we don't have a good way to test ClangdLSPServer :-( I would like to fix that, but I don't know of any easy fixes.

This is kind of gross, but do standard library includes work from our lit tests? You could #include <map> and then test using some symbols you know are there...

111

I'd like to be slightly less hostile than this to (broken) clients that fail to call initialize.
As the patch stands they'll get an empty SupportedSymbolKinds, and we'll crash if they call documentSymbols.

Instead I'd suggest pulling out defaultSymbolKinds() and initializing to that in the constructor, and then overriding with either SpecifiedSymbolKinds or SpecifiedSymbolKinds | defaultSymbolKinds() here.

clangd/ClangdServer.h
162

I think it would probably be more consistent with other functions to just take int limit here. I'm not sure CodeCompletion is an example we want to emulate.

ClangdLSPServer might even just grab it from the code completion options, since that's the behavior we actually want.

Up to you, though.

163

maybe a short FIXME here too e.g. "remove param when the index has line/col"

clangd/FindSymbols.cpp
43

Ah yes I agree, fallbacks are good. I just meant that ClangdLSPServer should ensure that we actually have the bitset populated with the right values. I think it looks good now.

clangd/Protocol.cpp
209

This doesn't actually seem unreachable (or won't stay that way), maybe log and return Null or something like that? (Wow, there's really no catch-all option for this mandatory enum...)

This revision is now accepted and ready to land.Apr 11 2018, 6:01 AM

(BTW @hokein has D45513 to add row/col to the index, which will allow all the file-reading stuff to be cleaned up, but no need to wait on that since it's all working now).

malaperle updated this revision to Diff 142112.Apr 11 2018, 9:21 PM
malaperle marked 2 inline comments as done.

Address comments.

(BTW @hokein has D45513 to add row/col to the index, which will allow all the file-reading stuff to be cleaned up, but no need to wait on that since it's all working now).

Looking forward to it!

clangd/ClangdLSPServer.cpp
109

It doesn't seem to work unfortunately. I'm not sure why yet. Either way, I think I would add this test as a separate patch if we are to add a standard library include because I'm a bit nervous it will break and we will have to revert it :)

111

Good catch. I initialized SupportedSymbolKinds with defaultSymbolKinds() and let this loop add additional symbol kinds.

clangd/ClangdServer.h
162

I think it makes sense. We can reintroduce an option struct when there is more than one thing in it.

clangd/Protocol.cpp
209

Null Is not in LSP 1.0 and 2.0 that's why I had put Variable at the beginning. Maybe String is better. Not sure what you think is best.

Still LG, thanks!
I'll look into the testing issue.

clangd/ClangdLSPServer.cpp
113

nit: the bounds checks at usage types, with explicit underlying type for the enum are inconsistent with what we do in other protocol enums, and seem to clutter the code.

All else equal, I'd prefer to have an enum/enum class with implicit type, and not have values that are out of the enum's range. It's a simpler model that matches the code we have, and doesn't need tricky casts to SymKindUnderlyingType. If we want to support clients that send higher numbers than we know about, we could either:

  • overload fromJSON(vector<SymbolKind>)&
  • just express the capabilities as vector<int> and convert them to the enum (and check bounds) at this point.

WDYT?

clangd/FindSymbols.h
27

Would be nice to have a comment describing: query syntax, limit=0 special behavior.

Still LG, thanks!
I'll look into the testing issue.

I thought about it after... I think it was because I was trying to test with std::unordered_map (to prevent multiple results) which needs std=c++11, I'll try with something else.

Still LG, thanks!
I'll look into the testing issue.

I thought about it after... I think it was because I was trying to test with std::unordered_map (to prevent multiple results) which needs std=c++11, I'll try with something else.

Worth a shot, but don't count on it - for c++ clang switched to using std=c++11 by default a while ago.

@malaperle, what's your plan of this patch? Are you going to land it before D45513? With the Line&Column info in the index, this patch could be simplified.

@malaperle, what's your plan of this patch? Are you going to land it before D45513? With the Line&Column info in the index, this patch could be simplified.

I'll address the last comment and wait for review. Probably at least another day of delay. So if that ends up after D45513 then I'll update it to simplify. Sounds good?

Still LG, thanks!
I'll look into the testing issue.

I thought about it after... I think it was because I was trying to test with std::unordered_map (to prevent multiple results) which needs std=c++11, I'll try with something else.

Worth a shot, but don't count on it - for c++ clang switched to using std=c++11 by default a while ago.

I just tried "std::basic_ostringstream" and that works. Seems safer.

malaperle updated this revision to Diff 142328.Apr 12 2018, 9:41 PM
malaperle marked an inline comment as done.

Address comments.

malaperle marked an inline comment as done.Apr 12 2018, 9:41 PM
malaperle added inline comments.
clangd/ClangdLSPServer.cpp
113

I think it's better to keep vector<SymbolKind> in Protocol.h, it is clearer and more in line with the protocol. I overloaded fromJSON, which simplifies the code here, but it still needs a static_cast to size_t for the bitset.set(). Other places there still needs a static_cast, like in defaultSymbolKinds for the loop, I can static_cast to size_t everywhere (or int?) but having SymKindUnderlyingType seems more correct. I changed it to size_t, let me know if it was something like that you had in mind.

@malaperle, what's your plan of this patch? Are you going to land it before D45513? With the Line&Column info in the index, this patch could be simplified.

I'll address the last comment and wait for review. Probably at least another day of delay. So if that ends up after D45513 then I'll update it to simplify. Sounds good?

Thanks, I have landed the patch today, you need to update your patch (I think it is mostly about removing the code of reading-file stuff).

sammccall accepted this revision.Apr 13 2018, 2:09 AM

Still LG

clangd/ClangdLSPServer.cpp
113

LG, though I've commented in one place where the cast seems necessary.

Personally I usually use int here, but size_t is good too. The exact type doesn't matter as long as it covers all values of the enum.

274

you no longer need the min/max asserts and their casts, because we don't allow any values of type SymbolKind that don't have one of the enum values (i.e. we're just echoing the type system here)

malaperle updated this revision to Diff 142403.Apr 13 2018, 7:20 AM
malaperle marked 2 inline comments as done.

Address comments. Simplify with using line/col from index.

Thanks, I have landed the patch today, you need to update your patch (I think it is mostly about removing the code of reading-file stuff).

I updated the patch using the new line/col from the index. This is great!

Any objections to the latest version?

Sorry, missed this.
Just a few leftovers from removing the line/col conversion code.
Then ship it!

clangd/ClangdServer.h
164

DraftStore is now unused and can be dropped :-)
(also the forward decl of it)

clangd/FindSymbols.cpp
12

dead include

clangd/FindSymbols.h
14

nit: header guards no longer match filename

25

dead forward decl (and the VirtualFileSystem and IntrusiveRefCntPtr includes)

hokein added inline comments.Apr 18 2018, 8:01 AM
clangd/FindSymbols.cpp
118

The current URI scheme (file, test) works fine without HintPath because they don't use it, but for some custom URI schemes, they might rely on the HintPath(e.g. the path of current main file) to resolve the absolute path correctly. Maybe add a FIXME here?

malaperle updated this revision to Diff 143219.Apr 19 2018, 7:53 PM
malaperle marked 5 inline comments as done.

Address comments.

Sorry for the embarrassing clean-ups I forgot!

malaperle updated this revision to Diff 143220.Apr 19 2018, 7:57 PM

Remove more includes.

This revision was automatically updated to reflect the committed changes.

Still LG, thanks!
I'll look into the testing issue.

I thought about it after... I think it was because I was trying to test with std::unordered_map (to prevent multiple results) which needs std=c++11, I'll try with something else.

Worth a shot, but don't count on it - for c++ clang switched to using std=c++11 by default a while ago.

I just tried "std::basic_ostringstream" and that works. Seems safer.

So this fails if there's no standard library available without flags, which is the case in google's test environment to ensure hermeticity :-(

In the short-term, we've disabled the test internally - did it trigger any buildbot failures?
In the medium term we should probably find a better way to test this :(

So this fails if there's no standard library available without flags, which is the case in google's test environment to ensure hermeticity :-(

In the short-term, we've disabled the test internally - did it trigger any buildbot failures?
In the medium term we should probably find a better way to test this :(

No buildbot failures yet. But I suggest we just remove the test and add it back when we collect more symbols, i.e. in main files. WDYT?

So this fails if there's no standard library available without flags, which is the case in google's test environment to ensure hermeticity :-(

In the short-term, we've disabled the test internally - did it trigger any buildbot failures?
In the medium term we should probably find a better way to test this :(

No buildbot failures yet. But I suggest we just remove the test and add it back when we collect more symbols, i.e. in main files. WDYT?

It makes sense, but @bkramer came up with some deep magic in rL330754 so I think we're actually good now.

It makes sense, but @bkramer came up with some deep magic in rL330754 so I think we're actually good now.

Nice! Thanks @bkramer !