This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Don't store completion info if the symbol is not used for code completion.
ClosedPublic

Authored by hokein on Jan 4 2019, 5:16 AM.

Details

Diff Detail

Repository
rL LLVM

Event Timeline

hokein created this revision.Jan 4 2019, 5:16 AM
nridge added a subscriber: nridge.Jan 5 2019, 7:38 PM

Might we want to keep some of this information for workspace/symbol? I mean, surely not "documentation", but perhaps "signature" and "return type"?

For example, when cquery responds to workspace/symbol, it puts into SymbolInformation.name a "detailed name" that includes the return type and signature for functions, allowing clients to display these in their "Open Element" dropdown. This is in turn useful for e.g. selecting the correct one among multiple overloads of a function to jump to.

@hokein, do you need reviewers for this? I'm happy to volunteer.

Might we want to keep some of this information for workspace/symbol? I mean, surely not "documentation", but perhaps "signature" and "return type"?

There's nothing stopping us from reintroducing this information if we start doing the same. I don't foresee difficulties with this. It would be easier to figure out the bits we actually need when we implement this functionality.
But I generally prefer to move fast and fix things as you go, even if that means going back and forth, others might disagree.

hokein edited the summary of this revision. (Show Details)Jan 7 2019, 5:41 AM
hokein removed a subscriber: ilya-biryukov.

@hokein, do you need reviewers for this? I'm happy to volunteer.

Thanks.

Might we want to keep some of this information for workspace/symbol? I mean, surely not "documentation", but perhaps "signature" and "return type"?

There's nothing stopping us from reintroducing this information if we start doing the same. I don't foresee difficulties with this. It would be easier to figure out the bits we actually need when we implement this functionality.
But I generally prefer to move fast and fix things as you go, even if that means going back and forth, others might disagree.

+1, we don't use the signature and return type in the workspace/symbol at the moment. We could revisit it when we actually need them.

ilya-biryukov added inline comments.Jan 7 2019, 7:59 AM
clangd/index/Index.h
232 ↗(On Diff #180233)

This comment would be most useful beside the mentioned fields themselves. Maybe add it there too? Possibly with a shorter form, since there's no need to mention the field names there.

clangd/index/SymbolCollector.cpp
543 ↗(On Diff #180233)

Most of the fields updated at the bottom aren't useful. However, I feel the documentation is actually important, since Sema only has doc comments for the current file and the rest are currently expected to be provided by the index.

I'm not sure if we already have the code to query the doc comments via index for member completions. If not, it's an oversight.
In any case, I suggest we always store the comments in dynamic index. Not storing the comments in the static index is fine, since any data for member completions should be provided by the dynamic index (we see a member in completion ⇒ sema has processed the headers ⇒ the dynamic index should know about those members)

hokein marked an inline comment as done.Jan 8 2019, 2:34 AM
hokein added inline comments.
clangd/index/SymbolCollector.cpp
543 ↗(On Diff #180233)

This is a good point.

For class member completions, we rely solely on Sema completions (no query being queried). I'm not sure it is practical to query the index for member completions.

  • this means for every code completion, we query the index, it may slow down completions
  • fuzzyFind is not supported for class members in our internal index service (due to the large number of them)

So it turns two possibilities:

  1. always store comments (SymbolCollector doesn't know whether it is used in static index or dynamic index)
  2. or drop them for now, this wouldn't break anything, and add it back when we actually use them for class completions

I slightly prefer 2) at the moment. WDYT?

ilya-biryukov added inline comments.Jan 8 2019, 7:35 AM
clangd/index/SymbolCollector.cpp
543 ↗(On Diff #180233)

Yeah, instead of using fuzzyFind(), we'll call lookup() to get details of the symbols we've discovered.
It's true that this is going to add some latency, but I hope we have the latency budget to handle this (these queries should be fast, e.g. we're doing this for signature help and I haven't seen any noticeable latency there from the index query, most of the running time is parsing C++).

I also like option 2, but unfortunately we already rely on this to get the comments in signature help, so this change would actually introduce regressions there (less used than code completion, but still not nice to break it)
Would the third option of having a config option (e.g. SymbolCollector::Options::StoreAllComments) work? clangd-indexer and auto-indexer would set the option to false, indexSymbols would set the option to true. We'll both get the optimizations and avoid introducing any regressions. The plumbing should be no more than a few lines of code.

hokein updated this revision to Diff 180808.Jan 9 2019, 2:49 AM
hokein marked 2 inline comments as done.

Address comments, store docs.

hokein added inline comments.Jan 9 2019, 2:50 AM
clangd/index/SymbolCollector.cpp
543 ↗(On Diff #180233)

Sounds fair. I totally missed signature help, it is unfortunate that our test case doesn't find this regression (added one!)

Would the third option of having a config option (e.g. SymbolCollector::Options::StoreAllComments) work? clangd-indexer and auto-indexer would set the option to false, indexSymbols would set the option to true. We'll both get the optimizations and avoid introducing any regressions. The plumbing should be no more than a few lines of code.

This is a reasonable solution, but I prefer to do it in a separated patch. Now I make this patch always store docs, which should not introduce any regressions. There is another optimization opportunity here -- unlike header symbols, docs from main-file symbols can be dropped from the index, we can retrieve them from Sema.

ilya-biryukov accepted this revision.Jan 9 2019, 5:38 AM

LGTM

clangd/index/Index.h
188 ↗(On Diff #180808)

NIT: Maybe change to only set when the symbol...? "Meaningful" might create confusion.

clangd/index/SymbolCollector.cpp
538 ↗(On Diff #180808)

NIT: consider inlining the lamda, the code inside it is simple enough.
Up to you, though.

543 ↗(On Diff #180233)

Totally, the main file symbol docs are not necessary. OTOH, the savings for the main files should be almost negligible, since there are not that many main files open at a time and comments is just a small fraction of the information we store for each file (preabmles, source code contents, etc.)

This revision is now accepted and ready to land.Jan 9 2019, 5:38 AM
hokein updated this revision to Diff 180827.Jan 9 2019, 5:53 AM
hokein marked an inline comment as done.

address review comments

ilya-biryukov accepted this revision.Jan 9 2019, 6:47 AM

LGTM again :-) I bet the savings are less now that we're always storing the comments in the static index, so the numbers in the description might be outdated.

hokein edited the summary of this revision. (Show Details)Jan 9 2019, 7:52 AM

LGTM again :-) I bet the savings are less now that we're always storing the comments in the static index, so the numbers in the description might be outdated.

Yes, fixed.

This revision was automatically updated to reflect the committed changes.