This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Index-based code completion.
ClosedPublic

Authored by ioeric on Dec 15 2017, 4:05 AM.

Details

Event Timeline

ioeric created this revision.Dec 15 2017, 4:05 AM
ioeric updated this revision to Diff 127118.Dec 15 2017, 6:10 AM
  • Merge remote-tracking branch 'origin/master' into index-completion
  • Fix merge with origin/master.

A few drive-by comments, will look deeper if I have time but don't wait for me.

clangd/Position.cpp
15

This is kind of note-to-self, but there are things to fix here

  • this seems to handle \r\n fine, no FIXME needed
  • UTF8 comment should be // FIXME: we're counting UTF8 bytes, LSP wants UTF16 code units
  • we should return Code.size() when we don't find a \n
  • we should cap the normal return value at Code.size()
  • Offset is weird - it should consistently point after the \n (init to 0, Offset = F + 1), so -1 at the end seems like a bug
35

This is deep magic (how it handles npos) and I'd like to replace it with something more explicit

clangd/Position.h
1

This seems like a really specific name - one possible generalization is "utilities that examine source code directly" - SourceCode.h?
This is just a hunch that we might have some more such logic around preambles, language detection, etc.

Hope you don't mind me piling on... let me know!
Mostly readability, but i'd also like to reduce the scope w.r.t namespace completion.

clangd/CodeComplete.cpp
90

toCompletionItemKind?

getKindOfSymbol doesn't match either the input or output, and they have very similar names...

(yes, the names above are bad too - feel free to fix them!)

103

LSP has struct (not in protocol yet)

fine for these to be FIXME

111

this seems wrong? I would have thought references are similar to c++ references?
can't find anything in LSP either way.

TypeAlias seems like it could default to class, for Using is complicated (would need to look at the subtype) - maybe just unknown?

114

conversionfunction might be rather operator, not sure

121

LSP has enum constant, not in protocol.h yet

131

destructor seems more like an operator or method

280

qualified

282

nit: 'info' is just noise. Specifier is vague too - consider calling this SpecifiedScope, putting the emphasis on the user interaction.

283

There's a lot of sema/ast details here, wedged in the middle of the code that deals with candidates and scoring. Can you forward declare this, and move this implementation and the other details somewhere lower?

283

"create" doesn't give any more semantics than a constructor would.

Maybe extractCompletionScope? (This could also be a free function)

310

The written/resolved distinction is tangled up with the various fields relating to the written form.

Could you pull out a struct here like:
struct SourceRange {size_t Begin; size_t End; std::string Text}

(This might be a reasonable candidate for `SourceCode.h)

Then this struct can become struct ScopeSpecifier { SourceRange Written, std::string Resolved }

322

nit: symbol is impossibly vague here - since we're always talking about symbols and it's hard to know you mean index::Symbol.

I'd call this indexCompletionItem, but just toCompletionItem seems fine too

325

as discussed offline: fuzzy matching and rewriting qualifiers is a can of worms.
I'd suggest we don't open it yet: just complete the symbols from exactly the typed scope.
Next we can work on making global completion work (empty qualifier can complete any namespace's symbols).
Finally we'll see if we need to tackle fuzzy-matching the parent namespace.

Among other things, this means that the first step doesn't have to rewrite qualifiers at all, the second only needs to insert them, and we defer thorny cases until the third.

347

we're a bit lax about the "method name should start with a verb" rule, but this one I actually found hard to parse.

completeWithIndex? (since there's no unqualified case yet, and there's no need to guess about how the code would be structured at this point)

362

suggest we do this in a structured way instead:
Index::fuzzyFind query has a vector<string> of namespaces to look up in

373

Please look for names that will tell the reader what the role of this thing is, and only add comments that provide extra information.

SemaCompletionInfo accurately describes most of the types in this file!

Best i can come up with is NameToComplete or LeadingText - not great :-(

375

the partial identifier that is being completed, without qualifiers

380

It's not clear what's to fix here - isn't non-qualified ID completion already represented by {Filter, None}?
(Feel free just to remove the comment if it's not clear what's to be done)

397–401

nit: auto seems a lot easier to read here?

Here are a few more comments.

clangd/CodeComplete.cpp
329

NIT: FIXME(ioeric)?

334

NIT: FIXME(ioeric)?
Also, why not provide some sortText here? Like a qualified name. Because we're currently missing scores?

380

NIT: FIXME(ioeric)?

847

NIT: FIXME(ioeric)?
Also, what's missing to resolve this FIXME in this commit?

clangd/CodeComplete.h
64

Given the comment, maybe we want to pass it as a separate parameter instead?

rwols added a subscriber: rwols.Dec 18 2017, 10:46 AM
ioeric updated this revision to Diff 127406.Dec 18 2017, 1:49 PM
ioeric marked 21 inline comments as done.
  • Address comments in CodeComplete.cpp
  • Split index changes into a separate patch.
  • Merged with patch D41351.
  • Remove Position.* files.

Thanks for the review!

clangd/CodeComplete.cpp
111

TypeAlias could be more than class I think? And they are references (to types) to some extend. Anyhow, this is copied from the conversion function above. I'll leave a FIXME for now.

114

Operator is not in the protocol either. I leave a FIXME and will clean it up in followup.

283

This is moved to a standalone function extractCompletionScope.

310

Got rid of ranges as we don't need any them in this revision.

334

Does sortText have to be non-empty? I thought the empty string would make all candidates equally scored?

I think we want either sensible score or (for now) no score at all.

373

NameToComplete sounds reasonable.

847

Didn't realize logger is ready to use. Switched to use logger.

clangd/CodeComplete.h
64

@sammccall suggested this approach in the previous prototype patch. I also ended up preferring this approach because:

  1. it doesn't require changing ClangdServer interfaces, and the dynamic index should be built by ClangdServer and thus doesn't make sense to be a parameter.
  2. it enables unit tests to use customized dummy indexes.
  3. we might take another (static) index in the future, and it seems easier to pass it in via the options than adding another parameter.
ioeric updated this revision to Diff 127488.Dec 19 2017, 3:57 AM
  • Merged with origin/master
  • Merged with D41351
ioeric updated this revision to Diff 127509.Dec 19 2017, 5:48 AM
  • Fixed a bug when completing scope that starts with '::'.
  • Diff base on origin/master
ilya-biryukov added inline comments.Dec 19 2017, 5:58 AM
clangd/CodeComplete.cpp
334

LS says that if sortText is empty, label should be used. We're probably fine here, you're right.

clangd/CodeComplete.h
64
  1. it doesn't require changing ClangdServer interfaces, and the dynamic index should be built by ClangdServer and thus doesn't make sense to be a parameter.

We do have it as a parameter to codeComplete method, the fact that it's wrapped in a struct does not make much difference.
ClangdServer should probably accept it in a constructor instead if we want to override some stuff via the dynamic index instead.

  1. it enables unit tests to use customized dummy indexes.

unit-testing code can easily wrap any interface, this shouldn't be a problem.

  1. we might take another (static) index in the future, and it seems easier to pass it in via the options than adding another parameter.

I'm looking at CodeCompleteOptions as a configurable user preference rather than a struct, containing all required inputs of codeComplete. I don't think SymbolIndex is in line with other fields that this struct contains.

sammccall accepted this revision.Dec 19 2017, 6:22 AM
sammccall added inline comments.
clangd/CodeComplete.cpp
111

Yeah, this is the problem I was alluding to before - LSP encourages us to be very specific in completion kinds - but we can't always be. Class seems like the best guess to me, but FIXME is also fine for now.

283

Thanks for extracting the function. These index-related details are still out of place - they split the two chunks of code that mostly do sema-based completion.
Can you move indexCompletionItem, completeWithIndex, and the implementation of extractCompletionScope below? I'd suggest immediately above codeComplete().

clangd/CodeComplete.h
64

So yeah, this is my fault...
I do agree it's unfortunate to have a non-user-specified param in the CodeComplete options. It's tricky - from the perspective of this module the index really is such an option, and we want to propagate it around like one. We just don't (currently) want to encourage its use as an API to clangdserver.

codeComplete currently has 9(!) arguments. Different people will have different reactions to this, mine is to do almost anything to avoid a 10th :-)

unit-testing code can easily wrap any interface, this shouldn't be a problem.

This doesn't match my experience at all. Can you suggest how to test this logic without adding parameters to at least ClangdServer::codeComplete and the codeComplete test helpers?

ClangdServer should probably accept it in a constructor instead if we want to override some stuff via the dynamic index instead.

As of today, the dynamic index is the only index clangd supports. There's nothing to accept in the constructor, it's entirely owned by ClangdServer.

This revision is now accepted and ready to land.Dec 19 2017, 6:22 AM
ilya-biryukov accepted this revision.Dec 19 2017, 7:58 AM

LGTM. I still think that we should move the SymbolIndex out of the struct, but don't want to block this patch.

clangd/CodeComplete.h
64

codeComplete currently has 9(!) arguments. Different people will have different reactions to this, mine is to do almost anything to avoid a 10th :-)

I do agree this is unfortunate and I'm not at all opposed to cleaning that up. By removing parameters that we don't need or grouping them into a struct that has saner default parameters or in a different way. Both 9 and 10 seem equally bad to me.

My point is that SymbolIndex should not be part of CodeCompleteOptions only because it makes it easier to pass it around.

This doesn't match my experience at all. Can you suggest how to test this logic without adding parameters to at least ClangdServer::codeComplete and the codeComplete test helpers?

Exactly, I would go with the test helper.

As of today, the dynamic index is the only index clangd supports. There's nothing to accept in the constructor, it's entirely owned by ClangdServer.

We do have implementations of the index in the tests that we pass around.

That being said, I don't want this change to be blocked by my opinion on this matter. This is a minor thing, compared to all the other changes in the patch. Just wanted to make a point that this field totally feels out of place.

ioeric updated this revision to Diff 127536.Dec 19 2017, 8:23 AM
ioeric marked an inline comment as done.
  • Move implementations around to make code easier to read.
ioeric updated this revision to Diff 127538.Dec 19 2017, 8:28 AM
  • Add a FIXME for Index in code completion options.
ioeric added inline comments.Dec 19 2017, 8:30 AM
clangd/CodeComplete.h
64

Thanks for the feedback! I added a FIXME for this. I am happy to clean this up when we have a saner way to config code completion in clangd :)

This revision was automatically updated to reflect the committed changes.