This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add more symbol information for code completion.
ClosedPublic

Authored by ioeric on Dec 18 2017, 3:32 AM.

Event Timeline

ioeric created this revision.Dec 18 2017, 3:32 AM
sammccall added inline comments.Dec 19 2017, 12:09 AM
clangd/index/Index.h
96

AFAIK this information isn't needed for retrieval/scoring, just for display.

LSP has completionItem/resolve that adds additional info to a completion item. This allows us to avoid sending a bunch of bulky comments, most of which will never be looked at.

In practice, there's nothing we particularly want to do differently for the memory index: we have to load the data into memory, and so including a pointer to it right away is no extra work.

However Symbol has two roles, and being the in-memory representation for MemIndex is only secondary. Its primary role is defining the protocol between indexes and clangd, including remote indexes where returning all documentation *is* expensive.

One option is to have Symbol just have the "core stuff" that's suitable for returning in bulk, and have an index method to retrieve more details that would be a point lookup only. (Maybe this is just the getSymbol method we've thought about). I'm not sure what it means for the data structure. OK if we discuss offline?

103

What are you planning to put here other than the return type of a function?
It's probably OK if we explicitly want this to be "whatever should go in the LSP detail field" but we should think about whether there's something more specific we can say.

104

How are you planning to use this?

This seems to be related to the completion text/edits. We had some early discussions about whether we'd encode something like CodeCompletionString, or LSP snippets, or something else entirely.
Honestly it would be great to have a doc describing this mapping between source -> index -> LSP for completion data.

clangd/index/SymbolCollector.cpp
59

it seems we'll want to share the(some?) doc logic between hover, AST-based complete, and indexing... See D35894 (which is ... complicated, no idea if it'll land soon).

Among other things:

  • we may not want to make the logic too elaborate until we're able to merge interfaces
  • we may want to consume AST nodes rather than CCS in the long run
65

Should these annotations go at the end?

88

nit: lowercase

88

How does this relate to the code in AST-based completion?

ioeric updated this revision to Diff 127730.Dec 20 2017, 8:07 AM
ioeric marked 4 inline comments as done.
  • Merge with origin/master
  • Fixed an error in merge
  • Make documentation etc optional in symbols
  • Merge remote-tracking branch 'origin/master' into symbol
  • Merge branch 'index-completion' into symbol
  • Address review comments; merge with D41450.

Thanks for the review!

Logic around CodeCompletionString is pulled into a separate file in D41450.

I also propagate the new completion info to completion code. I am happy to split it out if it adds too much noise for the review.

clangd/index/Index.h
96

As discussed offline, putting non-core stuff in an optional structure.

104

As discussed offline, we now store the whole snippet in the insertion text.

clangd/index/SymbolCollector.cpp
59

I pulled CodeCompletionString handling logic into a separate file in D41450.

sammccall added inline comments.Dec 20 2017, 9:32 AM
clangd/index/Index.h
105

insert texts can be in details I think? They're not required for completion until after resolve.

122

I think you probably want a raw pointer rather than optional:

  • reduce the size of the struct when it's absent
  • make it inheritance-friendly so we can hang index-specific info off it

(raw pointer rather than unique_ptr because it's owned by a slab not by malloc, but unique_ptr is ok for now)

clangd/index/SymbolCollector.cpp
61

this doesn't seem like the kind of thing we should be allocating per-symbol!
You can use a single one and Reset() it, I think

also why globalcodecompletionallocator, vs just codecompletionallocator?

ioeric updated this revision to Diff 127781.Dec 20 2017, 1:42 PM
ioeric marked an inline comment as done.
  • Merged with origin/master
  • Addressed some more comments.
  • Add new fields to YAML.
clangd/index/Index.h
105

Quote from https://github.com/Microsoft/language-server-protocol/blob/master/protocol.md#completion-request

The request can delay the computation od the detail and documentation properties. However, properties that are needed for the inital sorting and filtering, like sortText, filterText, insertText, and textEdit must be provided in the textDocument/completion request and must not be changed during resolve.
122

This is not easy for now with unique_ptr because of this line :( https://github.com/llvm-mirror/clang-tools-extra/blob/3565d1a1a692fc9f5c21e634b470535da2bb4d25/clangd/index/SymbolYAML.cpp#L141).

This shouldn't be an issue when we have the optimized symbol slab, where we store raw pointers. And we would probably want to serialize the whole slab instead of the individual symbols anyway.

reduce the size of the struct when it's absent

llvm::Optional doesn't take much more space, so the size should be fine.

make it inheritance-friendly so we can hang index-specific info off it

Could you elaborate on index-specific info? It's unclear to me how this would be used.

clangd/index/SymbolCollector.cpp
61

You can use a single one and Reset() it, I think

It's unclear how reset would affect CodeCompletionTUInfo, but we can simply maintain one allocator for each TU.

also why globalcodecompletionallocator, vs just codecompletionallocator?

They are actually the same thing, but CodeCompletionTUInfo requires a global one. I don't really know why...

sammccall added inline comments.Dec 21 2017, 1:23 AM
clangd/index/Index.h
122

This is not easy for now with unique_ptr because of this line

Oh no, somehow i missed this during review.
We shouldn't be relying on symbols being copyable. I'll try to work out how to fix this and delete the copy constructor.

This shouldn't be an issue when we have the optimized symbol slab, where we store raw pointers.

Sure. That's not a big monolithic/mysterous thing though, storing the details in the slab can be done in this patch... If you think it'll be easier once strings are arena-based, then maybe we should delay this patch until that's done, rather than make that work bigger.

And we would probably want to serialize the whole slab instead of the individual symbols anyway.

This i'm less sure about, but I don't think it matters.

llvm::Optional doesn't take much more space, so the size should be fine.

Optional takes the same size as the details itself (plus one bool). This is fairly small for now, but I think a major point of Details is to expand it in the future?

Could you elaborate on index-specific info? It's unclear to me how this would be used.

Yeah, this is something we talked about in the meeting with Marc-Andre but it's not really obvious - what's the point of allowing Details to be extended if clangd has to consume it?

It sounded like he might have use cases for using index infrastructure outside clangd. We might also have google-internal index features we want (matching generated code to proto fields?). I'm not really sure how compelling this argument is.

clangd/index/SymbolCollector.h
25

can you add a comment to the class indicating that it needs to be used for one TU and then thrown away? This seems unfortunate but is probably simpler than the alternative. It also seems to be a new restriction with this patch.

ioeric updated this revision to Diff 128505.Jan 3 2018, 2:48 AM
  • Merge with origin/master. Use Arena for symbol details.
ioeric added inline comments.Jan 3 2018, 3:08 AM
clangd/index/Index.h
122

Thanks for the allocator change!

Details now contains just stringrefs. I wonder how much we want it to be inherit-friendly at this point, as the size is relatively small now. If you think this is a better way to go, I'll make the structure contain strings and store the whole structure in arena. This would require some tweaks for yamls tho but shouldn't be hard.

clangd/index/SymbolCollector.h
25

How about re-initializing the allocator for each new AST in intialize?

sammccall accepted this revision.Jan 9 2018, 1:07 AM

Just some nits

clangd/index/Index.h
122

So this needs to be at least optional I think - at the moment, how does an API user know whether to fetch the rest of the details?

FWIW, the size difference is already significant: symbol is 136 bytes if this is a unique_ptr, and 164 bytes if it's optional (+20%) - StringRefs are still pretty big.
But I don't feel really strongly about this.

clangd/index/SymbolCollector.cpp
72

you shouldn't initialize these both here and in initialize(), right?

131

why conditional? note we're interning the strings anyway, and an empty stringref is the same size as a full one

This revision is now accepted and ready to land.Jan 9 2018, 1:07 AM
ioeric updated this revision to Diff 129058.Jan 9 2018, 3:43 AM
ioeric marked 2 inline comments as done.
  • [clangd] Address review comments; made Detail a pointer.
ioeric updated this revision to Diff 129059.Jan 9 2018, 3:47 AM
ioeric marked an inline comment as done.
  • Minor cleanup.
ioeric updated this revision to Diff 129075.Jan 9 2018, 6:48 AM
clangd/index/Index.h
122

Done. Made it a raw pointer. PTAL

sammccall accepted this revision.Jan 9 2018, 9:13 AM
sammccall added inline comments.
clangd/index/SymbolYAML.cpp
84

this removes the Detail object if it's empty - this seems maybe unneccesary and certainly the wrong layer. It seems enough to do

if (!outputting)
  Detail = (allocate)
else if (!Detail)
  return;

io.mapOptional("Documentation", Detail->Documentation);
// etc
ioeric updated this revision to Diff 129106.Jan 9 2018, 9:30 AM
ioeric marked an inline comment as done.
  • Addrress review comment.
This revision was automatically updated to reflect the committed changes.