This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Drop impossible completions (unavailable or inaccessible)
ClosedPublic

Authored by sammccall on Nov 9 2017, 3:01 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

sammccall created this revision.Nov 9 2017, 3:01 AM
ilya-biryukov edited edge metadata.Nov 9 2017, 3:15 AM
ilya-biryukov added a subscriber: klimek.

I personally think they're useful to have anyway, and they don't add much noise when they're at the end of completions list.
I sometimes want to complete an item I know is there and then fix accessibility. Do you think they add too much noise?

I think I'm almost alone in this camp, though. (I remember @klimek saying he thinks they're sometimes useful too, but I may be mistaken)

I personally think they're useful to have anyway, and they don't add much noise when they're at the end of completions list.
I sometimes want to complete an item I know is there and then fix accessibility. Do you think they add too much noise?

For me the noise definitely outweighs the value, e.g. when I'm trying to see all the available functions. There are costs beyond noise:

  • they bulk up tests and debugging output
  • they complicate the scoring scheme - we *never* want them to appear above a legal completion, so a 1-dimensional score needs special handling as here.
  • compute: we spend time scoring them, computing their strings, writing them over the wire, and then the client has to process them
klimek added a comment.Nov 9 2017, 3:29 AM

I personally think they're useful to have anyway, and they don't add much noise when they're at the end of completions list.
I sometimes want to complete an item I know is there and then fix accessibility. Do you think they add too much noise?

For me the noise definitely outweighs the value, e.g. when I'm trying to see all the available functions. There are costs beyond noise:

  • they bulk up tests and debugging output
  • they complicate the scoring scheme - we *never* want them to appear above a legal completion, so a 1-dimensional score needs special handling as here.
  • compute: we spend time scoring them, computing their strings, writing them over the wire, and then the client has to process them

I do think they are useful to me, but I don't know what the percentages of folks who want them vs. not are, and whether people have actually tried multiple schemes.
I think it's more likely that we get bug reports to put them in if we don't show them than the other way round, so I think taking them out is a good way to get that feedback :)

ilya-biryukov accepted this revision.Nov 9 2017, 4:21 AM

This is probably one of the things that I'd like to be configurable.

  • they bulk up tests and debugging output

I'd argue that the more you put into tests, the more potential errors you'll see. E.g. if we're not showing private members, we might miss some presentation issues that only those exhibit. And you can always leave them out in tests in case you only want to test the public ones (though that's not a good idea, probably).
As for debugging, I don't LSP is any good as a debugging format anyway, we probably want some different form of output.

  • they complicate the scoring scheme - we *never* want them to appear above a legal completion, so a 1-dimensional score needs special handling as here.

We could always use a 2-dimensional score (accessible, sort_score). I'd love to have some editor support to grey-out some items (not only inaccessible, deprecated items could have been somehow signalled in completion UI as well), probably requires some changes to LSP too and not a priority, though.

  • compute: we spend time scoring them, computing their strings, writing them over the wire, and then the client has to process them

I don't think it matters much, given that there aren't many items that are inaccessible compared to the total number of completion items when it's slow (they are all class members, but the biggest portion of items come from namespaces). And after-dot completion is usually fast (and when it's slow it's because reparse is slow, not because we take too much time to process the items).

That being said, I don't think there's a chance I'll argue my way through this. Let's turn them off and see if someone (besides me) complains. LGTM.

This revision is now accepted and ready to land.Nov 9 2017, 4:21 AM

I would prefer to make this behaviour configurable.

So I probably should have started from the other end, here :-)

I'd really like to make the completion retrieval and ranking more flexible. In particular

  • incorporating results from other sources (indexes: both in-memory and external services).
  • combining more quality signals like usage count and fuzzy-match-strength in a non-lexicographic-sort way

The biggest difficulty in *supporting* unusable functions is maintaining the invariant that all unusable functions are ranked lower than usable ones - all code that deals with ranking has to deal with this special case, e.g. by making score a tuple instead of a single number.

If the current approach of "give them a penalty" is enough, knowing that in the future it may lead to e.g. a very widely used but inaccessible protected function being ranked highly, then that seems fine to me too. A wider configuration space means testing is more work, but happy to live with it. What do you think?

(With my user-hat on, configurable is fine, though I do strongly feel they should be off by default, and it seems unlikely many users will change the defaults.)

So I probably should have started from the other end, here :-)

I'd really like to make the completion retrieval and ranking more flexible. In particular

  • incorporating results from other sources (indexes: both in-memory and external services).
  • combining more quality signals like usage count and fuzzy-match-strength in a non-lexicographic-sort way

The biggest difficulty in *supporting* unusable functions is maintaining the invariant that all unusable functions are ranked lower than usable ones - all code that deals with ranking has to deal with this special case, e.g. by making score a tuple instead of a single number.

That sounds good to me. Please keep in mind that not all clients might want to take advantage of these things as they might have their own fuzz-match logic and external result injection.

If the current approach of "give them a penalty" is enough, knowing that in the future it may lead to e.g. a very widely used but inaccessible protected function being ranked highly, then that seems fine to me too. A wider configuration space means testing is more work, but happy to live with it. What do you think?

(With my user-hat on, configurable is fine, though I do strongly feel they should be off by default, and it seems unlikely many users will change the defaults.)

I'd be ok with off by default, as long as it's possible to turn it on :)

So I probably should have started from the other end, here :-)

I'd really like to make the completion retrieval and ranking more flexible. In particular

  • incorporating results from other sources (indexes: both in-memory and external services).
  • combining more quality signals like usage count and fuzzy-match-strength in a non-lexicographic-sort way

The biggest difficulty in *supporting* unusable functions is maintaining the invariant that all unusable functions are ranked lower than usable ones - all code that deals with ranking has to deal with this special case, e.g. by making score a tuple instead of a single number.

That sounds good to me. Please keep in mind that not all clients might want to take advantage of these things as they might have their own fuzz-match logic

Yup! My understanding of the protocol is clients will generally trigger completion after the dot, and then filter client-side *unless* the result set is incomplete.
So fuzzy-match filtering/scoring would only be triggered if we limit the number of results (proposed as optional in D39852) or if completion is explicitly triggered mid-identifier (maybe we should allow turning this off too).

and external result injection.

Absolutely. To be concrete, i'm thinking about three overlaid:

  1. a static generated index that clangd consumes (e.g. generated by index-while-building and located on disk, or a hosted index of google's monorepo)
  2. a dynamically generated index of symbols in files that clangd has seen. (Likely to be dirty with respect to 1)
  3. context-sensitive completion results as today

1 would definitely be optional.
2 isn't really external... we might want to handle global symbols with 2 and not 3 as now, but that's an implementation detail.

If the current approach of "give them a penalty" is enough, knowing that in the future it may lead to e.g. a very widely used but inaccessible protected function being ranked highly, then that seems fine to me too. A wider configuration space means testing is more work, but happy to live with it. What do you think?

(With my user-hat on, configurable is fine, though I do strongly feel they should be off by default, and it seems unlikely many users will change the defaults.)

I'd be ok with off by default, as long as it's possible to turn it on :)

Sure, I'll update the patch.

For these things that are (I assume) more "user preference" than project- or integration-specific, I wonder whether command-line flags are the right thing. Configuration in ~/.clangd is a can of worms, but might be the right thing in the long-run.

sammccall planned changes to this revision.Nov 9 2017, 4:59 PM
sammccall updated this revision to Diff 123817.Nov 21 2017, 9:16 AM

Add an option to allow impossible completions.
While here, remove 'helpful' constructors that take some subset of the params.

This revision is now accepted and ready to land.Nov 21 2017, 9:16 AM
sammccall updated this revision to Diff 123819.Nov 21 2017, 9:22 AM

(trying to resync the diff to head)

sammccall requested review of this revision.Nov 21 2017, 9:24 AM
sammccall edited edge metadata.

PTAL - this is now optional. The results will be hidden by default, but can be turned on with a flag.

ilya-biryukov added inline comments.Nov 22 2017, 2:10 AM
clangd/tool/ClangdMain.cpp
169 ↗(On Diff #123819)

We should also set IncludeCodePatterns = EnableSnippets here. May be worth adding a test that they are in the completion results.

sammccall added inline comments.Nov 23 2017, 3:44 AM
clangd/tool/ClangdMain.cpp
169 ↗(On Diff #123819)

Good catch!
Unfortunately our previous attempt this failed because static_cast is not a code pattern (probably it should be).

  • Added a test to completion-items-kinds.
  • Added documentation to the flag to indicate it does this
  • Changed the CodePatterns default to true (equivalent to setting it here, and keeps the defaults in one place)
  • set flag defaults based on CompleteOption defaults so they're in one place
sammccall updated this revision to Diff 124066.Nov 23 2017, 4:39 AM

Address review comments

This revision is now accepted and ready to land.Nov 23 2017, 5:32 AM
This revision was automatically updated to reflect the committed changes.