This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Add "member" symbols to the index
ClosedPublic

Authored by malaperle on Mar 27 2018, 1:47 PM.

Details

Summary

This adds more symbols to the index:

  • member variables and functions
  • enum constants in scoped enums

The code completion behavior should remain intact but workspace symbols should
now provide much more useful symbols.
Other symbols should be considered such as the ones in "main files" (files not
being included) but this can be done separately as this introduces its fair
share of problems.

Signed-off-by: Marc-Andre Laperle <marc-andre.laperle@ericsson.com>

Diff Detail

Repository
rL LLVM

Event Timeline

malaperle created this revision.Mar 27 2018, 1:47 PM
malaperle planned changes to this revision.Mar 27 2018, 1:52 PM
malaperle added inline comments.
clangd/index/Index.h
142 ↗(On Diff #139997)

I'm thinking of replacing this with something like:
Decl::Kind DeclContextKind;
bool IsScoped (enum);
bool IsInMainFile;

255 ↗(On Diff #139997)

Would be removed when "ForCompletion" is replaced.

clangd/index/MemIndex.cpp
48 ↗(On Diff #139997)

Would be removed when "ForCompletion" is replaced.

clangd/index/SymbolCollector.cpp
124 ↗(On Diff #139997)

This code would be replaced with setting individual Symbol fields instead.

342 ↗(On Diff #139997)

Here we would set fields that will replace "ForCompletion"

Rework to include member vars/functions and scoped enumerators.

malaperle retitled this revision from [clangd] [RFC] Add more symbols to the index to [clangd] Add "member" symbols to the index.May 11 2018, 12:41 PM
malaperle edited the summary of this revision. (Show Details)
malaperle added inline comments.May 11 2018, 12:44 PM
clangd/CodeComplete.cpp
932 ↗(On Diff #146391)

I want to add a comment here, but I want to make sure I understand why in the first place we were not indexing symbols outside these contexts for the purpose of code completion. Is it because those will be available by Sema code completion anyway?

@ioeric You mentioned in D46751 that it would make sense to add a flag to disable indexing members. Could you comment on that? What kind of granularity were you thinking? Would a "member" flag cover both class members (member vars and functions) and enum class enumerators for example? I think that would be reasonable. But I will also add symbols in main files too, so another flag for that? Hmmm.

A few questions regarding class members. To pinpoint some interesting cases and agree on how we want those to behave in the long run.

How do we handle template specializations? What will the qualified names of those instantiations be?
I.e. how do I query for push_back inside a vector? Which of the following queries should produce a result?

  • vector::push_back. Should it match both vector<T>::push_back and vector<bool>::push_back or only the first one?
  • vector<bool>::push_back
  • vector<int>::push_back

What scopes will non-scoped enum members have?
E.g. if I have enum En { A,B,C}, which of the following queries will and won't find enumerators?

  • En::A
  • ::A
  • A
clangd/CodeComplete.cpp
932 ↗(On Diff #146391)

C++ lookup rules inside classes are way more complicated than in namespaces and we can't possibly hope to give a decent approximation for those.
Moreover, completion inside classes does not require any non-local information, so there does not seem to be a win when using the index anyway.
So we'd rather rely on clang to do completion there, it will give way more useful results than any index implementation.

clangd/index/Index.h
160 ↗(On Diff #146391)

How do we use DeclContextKind?
Why did we decide to not go with a bool ForCompletion instead? (I'm probably missing a conversation in the workspaceSymbol review, could you point me to those instead?)

I'm asking because clang enums are very detailed and designed for use in the compiler, using them in the index seems to complicate things.
It feels we don't need this level of detail in the symbols. Similar to how we don't store the whole structural type, but rely on string representation of completion label instead.

@ioeric You mentioned in D46751 that it would make sense to add a flag to disable indexing members. Could you comment on that? What kind of granularity were you thinking? Would a "member" flag cover both class members (member vars and functions) and enum class enumerators for example? I think that would be reasonable. But I will also add symbols in main files too, so another flag for that? Hmmm.

Sam convinced me that members would still be interesting for our internal index service (e.g. locations of members would be useful for go-to-def). I'll investigate how much impact that would be by running the indexer with your change patched in, but I don't want to block you on that, so I'm fine with checking this in without any filter. We could revisit the filter design when needed.

We actually had an option for indexing symbols in main files :) But it was removed as it turned out to be unused for the features clangd had at that point. I think it would be reasonable to add it back if we start supporting collecting main file symbols again. Maybe out of the scope of this patch, but I am interested in the use cases you have for symbols in main files, besides in workspaceSymbols?

clangd/index/Index.h
160 ↗(On Diff #146391)

+1

ForCompletion sounds reasonable as the current design of index-based code completion relies on assumptions about contexts.

162 ↗(On Diff #146391)

In case we do need these fields, I think they should go into index::SymbolInfo (i.e. SymInfo above)? As Ilya said, Symbol might be the wrong level to put these. And we would want them in index-while-build anyway.

ioeric added inline comments.May 17 2018, 8:23 AM
clangd/index/SymbolCollector.cpp
113 ↗(On Diff #146391)

We should be careful when removing the InTopLevelScope filter. According to clang/AST/DeclBase.h, DeclContext can be any of the following. For example, indexing symbols in functions seems to be out of the scope of this patch. I think we should keep a whitelist instead of turning them all at once. It's unfortunately we don't have tests to catch those...

///   TranslationUnitDecl
///   NamespaceDecl
///   FunctionDecl
///   TagDecl
///   ObjCMethodDecl
///   ObjCContainerDecl
///   LinkageSpecDecl
///   ExportDecl
///   BlockDecl
///   OMPDeclareReductionDecl

For class members, I would be good to add more tests for symbol collector e.g. more realistic class layouts with constructor, friends etc.

A few questions regarding class members. To pinpoint some interesting cases and agree on how we want those to behave in the long run.

How do we handle template specializations? What will the qualified names of those instantiations be?
I.e. how do I query for push_back inside a vector? Which of the following queries should produce a result?

  • vector::push_back. Should it match both vector<T>::push_back and vector<bool>::push_back or only the first one?
  • vector<bool>::push_back
  • vector<int>::push_back

It's probably better to consider this in a future patch. Maybe something like the first suggestion: vector::push_back and match both. Otherwise, I would think it might be a bit too verbose to have to spell out all of the specialization. Maybe we could allow it too. So... all of the above? :)

What scopes will non-scoped enum members have?
E.g. if I have enum En { A,B,C}, which of the following queries will and won't find enumerators?

  • En::A
  • ::A
  • A

Hmm. I think all of them, since you can refer them like that in code too. Case #1 doesn't work but that was the case before this patch so it can probably be addressed separately. I'll add some tests though!

@ioeric You mentioned in D46751 that it would make sense to add a flag to disable indexing members. Could you comment on that? What kind of granularity were you thinking? Would a "member" flag cover both class members (member vars and functions) and enum class enumerators for example? I think that would be reasonable. But I will also add symbols in main files too, so another flag for that? Hmmm.

Sam convinced me that members would still be interesting for our internal index service (e.g. locations of members would be useful for go-to-def). I'll investigate how much impact that would be by running the indexer with your change patched in, but I don't want to block you on that, so I'm fine with checking this in without any filter. We could revisit the filter design when needed.

We actually had an option for indexing symbols in main files :) But it was removed as it turned out to be unused for the features clangd had at that point. I think it would be reasonable to add it back if we start supporting collecting main file symbols again. Maybe out of the scope of this patch, but I am interested in the use cases you have for symbols in main files, besides in workspaceSymbols?

It's also for textDocument/documentSymbol. For this, we technically don't need them in the static index since we could collect symbols when the document is opened, but we also want them for workspaceSymbols so we might as well use the same symbol collector, etc. There should be more things that would also use symbol in main files, like Type Hierarchy.

clangd/CodeComplete.cpp
932 ↗(On Diff #146391)

Makes sense. Thanks! I'll be able to document this better now with the full picture.

clangd/index/Index.h
160 ↗(On Diff #146391)

My thinking was that the "ForCompletion" boolean was too specific and tailored for one client use. I thought the Symbol information should not have that much hard-coded knowledge on how it would be used. It would be odd to have "ForWorkspaceSymbol", "ForDocumentSymbol", etc. That is why I was going for a slightly more detailed symbol information that opened the door for more arbitrary queries for symbol clients. But it complicates things a bit more and I'd be happy to bring back the "ForCompletion" if it makes more sense for now.

clangd/index/SymbolCollector.cpp
113 ↗(On Diff #146391)

For symbol in functions, they are not collected because "IndexOpts.IndexFunctionLocals = false" for the symbol collector. I'll investigate the other ones and add more tests.

It's also for textDocument/documentSymbol. For this, we technically don't need them in the static index since we could collect symbols when the document is opened, but we also want them for workspaceSymbols so we might as well use the same symbol collector, etc. There should be more things that would also use symbol in main files, like Type Hierarchy.

I see. Thanks! Looks like we would need to collect symbols in main file at some point. We could probably filter out main file symbols in dynamic index for code completion, but we would probably need a flag to turn off main file symbols in case this leads to a huge size increase. I would also expect different indexers (dynamic and static ) to share SymbolCollector, but it should be easily configured to behave differently via SymbolCollector::Options. Anyhow, let's discuss more about this when we are turning on main file symbols :)

clangd/index/Index.h
160 ↗(On Diff #146391)

I think code completion, with the most complicated use of the index so far, probably deserves a flag :P I would expect/hope other features to be less "picky" about symbols. A high-level flag like ForCompletion would help keep knowledge about filtering for code completion (e.g. enums ...) inside symbol collector, which I think could be a win.

FWIW, ForCompletion makes it sound like a symbols is collected specifically for code completion. Maybe something like bool SupportGlobalCompletion would be better?

It's probably better to consider this in a future patch. Maybe something like the first suggestion: vector::push_back and match both. Otherwise, I would think it might be a bit too verbose to have to spell out all of the specialization. Maybe we could allow it too. So... all of the above? :)

This certainly does not have to be addressed in this patch. Just wanted to collect opinions on what the behavior we want in the long term.
My thoughts would be towards allowing only vector::push_back and make it match both push_backs: in vector<bool> and vector<T>.
Other cases might work too, but I wouldn't try implementing something that matches specializations. It's just too complicated in the general case. This will only be used in workspaceSymbol and it's fine to give a few more results there...

What scopes will non-scoped enum members have?

Hmm. I think all of them, since you can refer them like that in code too. Case #1 doesn't work but that was the case before this patch so it can probably be addressed separately. I'll add some tests though!

I would vote for making queries En::A and A match the enumerator, but not ::A. The reasoning is: yes, you can reference it this way in a C++ file, but workspaceSymbol is not a real C++ resolve and I think it should match the outline of the code rather than the actual C++ lookup rules.
E.g. I wouldn't expect it to match symbols from base classes, etc. This should also simplify implementation too.

It's probably better to consider this in a future patch. Maybe something like the first suggestion: vector::push_back and match both. Otherwise, I would think it might be a bit too verbose to have to spell out all of the specialization. Maybe we could allow it too. So... all of the above? :)

This certainly does not have to be addressed in this patch. Just wanted to collect opinions on what the behavior we want in the long term.
My thoughts would be towards allowing only vector::push_back and make it match both push_backs: in vector<bool> and vector<T>.
Other cases might work too, but I wouldn't try implementing something that matches specializations. It's just too complicated in the general case. This will only be used in workspaceSymbol and it's fine to give a few more results there...

Sounds very reasonable.

What scopes will non-scoped enum members have?

Hmm. I think all of them, since you can refer them like that in code too. Case #1 doesn't work but that was the case before this patch so it can probably be addressed separately. I'll add some tests though!

I would vote for making queries En::A and A match the enumerator, but not ::A. The reasoning is: yes, you can reference it this way in a C++ file, but workspaceSymbol is not a real C++ resolve and I think it should match the outline of the code rather than the actual C++ lookup rules.
E.g. I wouldn't expect it to match symbols from base classes, etc. This should also simplify implementation too.

I don't have a strong opinion, so I can try this suggestion!

clangd/index/Index.h
160 ↗(On Diff #146391)

Maybe something like bool SupportGlobalCompletion would be better?

Sounds good!

What scopes will non-scoped enum members have?

Hmm. I think all of them, since you can refer them like that in code too. Case #1 doesn't work but that was the case before this patch so it can probably be addressed separately. I'll add some tests though!

I would vote for making queries En::A and A match the enumerator, but not ::A. The reasoning is: yes, you can reference it this way in a C++ file, but workspaceSymbol is not a real C++ resolve and I think it should match the outline of the code rather than the actual C++ lookup rules.
E.g. I wouldn't expect it to match symbols from base classes, etc. This should also simplify implementation too.

I don't have a strong opinion, so I can try this suggestion!

I changed the behavior of non-scoped enums as suggested here: D47223

malaperle updated this revision to Diff 148329.May 23 2018, 7:22 PM

Use "SupportGlobalCompletion", white-list decl contexts, add more tests

The change looks mostly good. Some nits and questions about the testing.

clangd/index/Index.h
158 ↗(On Diff #148329)

s/this is symbol/this symbol/?

clangd/index/SymbolCollector.cpp
158 ↗(On Diff #148329)

(Disclaimer: I don't know obj-c...)

It seems that some objc contexts here are good for global code completion. If we want to support objc symbols, it might also make sense to properly set SupportGlobalCompletion for them.

359 ↗(On Diff #148329)

Could we pull this into a helper function? Something like S.SupportGlobalCompletion = DeclSupportsGlobalCompletion(...)?

unittests/clangd/CodeCompleteTests.cpp
657 ↗(On Diff #148329)

IIUC, this tests the SupportGlobalCompletion filtering logic for index completion? If so, I think it would be more straightforward to do something like:

Sym NoGlobal(...);
NoGlobal.SupportGlobalCompletion = false;
...
completions (Code, {NoGlobal, func(...), cls(...)})
// Expect only global symbols in completion results.

This avoid setting up the ClangdServer manually.

741 ↗(On Diff #148329)

It's not clear to me what the following tests (Enums, AnonymousNamespace, InMainFile) are testing. Do they test code completion or symbol collector? If these test symbol collection, they should be moved int SymbolCollectorTest.cpp

785 ↗(On Diff #148329)

I think the behaviors above are the same before this change, so I'm not quite sure what they are testing. Note that symbols in main files are not collected into dynamic index, and by default the tests do not use dynamic index.

846 ↗(On Diff #148329)

I think this is expected to be empty even for symbols that are not in anonymous namespace because symbols in main files are not indexed.

unittests/clangd/SymbolCollectorTests.cpp
70 ↗(On Diff #148329)

nit: I'd probably call this Global for convenience.

216 ↗(On Diff #148329)

Could you also add checkers for the SupportGlobalCompletion field?

404 ↗(On Diff #148329)

Could you please also add checkers for GlobalCodeCompletion for these enums?

malaperle marked 6 inline comments as done.

Address comments.

clangd/index/SymbolCollector.cpp
158 ↗(On Diff #148329)

Sounds good. Maybe it would be OK to do this in another (small) patch? I also know next to nothing about obj-c :)

unittests/clangd/CodeCompleteTests.cpp
741 ↗(On Diff #148329)

They are testing that code completion works as intended regardless of how symbol collector is implemented. It's similar to our previous discussion in D44882 about "black box testing". I can remove them but it seems odd to me to not have code completion level tests for all cases because we assume that it will behave a certain way at the symbol collection and querying levels.

malaperle added inline comments.May 28 2018, 11:56 AM
unittests/clangd/SymbolCollectorTests.cpp
141 ↗(On Diff #148834)

This allows to override the "-xc++" with something else, i.e. -xobjective-c++

sammccall added inline comments.
clangd/index/Index.h
279 ↗(On Diff #148834)

please also avoid "global" here, e.g. RestrictForCodeCompletion

158 ↗(On Diff #148329)

Sorry, I hadn't seen this patch until now.
When it was part of the workspace/symbol patch, I was the other person concerned this was too coupled to existing use cases and preferred something more descriptive.

I dug up an analysis @hokein and I worked on. One concluseion we came to was that we thought results needed by completion were a subset of what workspace/symbol needs, so a boolean would work. It seemed a bit ad-hoc and fragile though.

The cases we looked at were:

privatememberlocalprimary templatetemplate specializationnested in template
code completeNNNYNN
workspace/symbolYYNYYY
go to defnYY????

(Y = index should return it, N = index should not return it, ? = don't care)

So the most relevant information seems to be:

  • is this private (private member, internal linkage, no decl outside cpp file, etc)
  • is this nested in a class type (or template)
  • is this a template specialization

I could live with bundling these into a single property (though they seem like good ranking signals, and we'd lose them for that purpose), it will certainly make a posting-list based index more efficient.

In that case I think we should have canonical documentation *somewhere* about exactly what this subset is, and this field should reference that.
e.g. isIndexedForCodeCompletion() in CodeComplete.h with docs and IsIndexedForCodeCompletion here. (Please avoid "global" as it has too many different meanings - here we mean "index-based").

malaperle marked an inline comment as done.May 30 2018, 2:35 PM
malaperle added inline comments.
clangd/index/Index.h
158 ↗(On Diff #148329)

OK, I added documentation on the SymbolCollector (which was outdated) about what kinds of symbols are collected, with a reference to shouldFilterDecl. The subset is documented on isIndexedForCodeCompletion(), also referenced from the Symbol field. Was that more or less what you meant?

malaperle updated this revision to Diff 149206.May 30 2018, 2:37 PM

Update with suggestions.

ioeric accepted this revision.Jun 1 2018, 2:33 AM

lgtm

clangd/index/Index.h
158 ↗(On Diff #148329)

I could live with bundling these into a single property (though they seem like good ranking signals, and we'd lose them for that purpose), it will certainly make a posting-list based index more efficient.

FWIW, I think we could still add those signals when we need them in the future. It seems reasonable to me for the code completion flag to co-exist with ranking signals that could potentially overlap.

unittests/clangd/CodeCompleteTests.cpp
741 ↗(On Diff #148329)

FWIW, I am not against those tests at all; increasing coverage is always a nice thing to do IMO. I just thought it would make more sense to add them in a separate patch if they are not related to changes in this patch; I found unrelated tests a bit confusing otherwise.

unittests/clangd/SymbolCollectorTests.cpp
141 ↗(On Diff #148834)

I think this could also be a comment in the code :)

This revision is now accepted and ready to land.Jun 1 2018, 2:33 AM
sammccall accepted this revision.Jun 1 2018, 5:19 AM

Thanks, LG!

clangd/CodeComplete.h
86 ↗(On Diff #149206)

nit: want -> consider?

94 ↗(On Diff #149206)

A little more context:
"// Other symbols still appear in the index for other purposes, like workspace/symbols or textDocument/definition, but are not used for code completion"

clangd/index/SymbolCollector.h
22 ↗(On Diff #149206)

nit: can you change "symbols" here to "declarations"?
currently we rely a bit too heavily on the user knowing what "symbol" means

malaperle updated this revision to Diff 149532.Jun 1 2018, 12:31 PM
malaperle marked 4 inline comments as done.

Address comments.

unittests/clangd/CodeCompleteTests.cpp
741 ↗(On Diff #148329)

I found unrelated tests a bit confusing otherwise.

Fair point!

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.