SymbolCollector will be used for two cases:
- collect Symbol type only, used for indexing preamble AST.
- collect Symbol and SymbolOccurrences, used for indexing main AST.
For finding local references from the AST, we will implement it in other ways.
Paths
| Differential D50385
[clangd] Collect symbol occurrences in SymbolCollector ClosedPublic Authored by hokein on Aug 7 2018, 6:14 AM.
Details
Summary SymbolCollector will be used for two cases:
For finding local references from the AST, we will implement it in other ways.
Diff Detail
Event TimelineHerald added subscribers: arphaman, mgrang, jkorous and 2 others. · View Herald TranscriptAug 7 2018, 6:14 AM Comment Actions 2 high-level questions:
Comment Actions
Storing occurrences in Symbol structure is easy to misuse by users IMO -- if we go through this way, we will end up having a getOccurrences-like method in Symbol structure. Once users get the Symbol instance, it is natural for them to call getOccurrences to get all occurrences of the symbol. However this getOccurrences method doesn't do what users expected (just returning an incomplete set of results or empty). To query the symbol occurrences, we should always use index interface. Therefore, I think we should try to avoid these confusions in the design.
The SymbolOccurrenceCollector has many responsibilities (collecting declaration, definition, code completion information etc), and the code is growing complex now. Merging the SymbolOccurrenceCollector to it will make it more complicated -- we will introduce more option flags like collect-symbol-only, collect-occurrence-only to configure it for our different use cases (we need to the implementation detail clearly in order to make a correct option for SymbolCollector). And I can foresee these two collectors might be run at different point (runWithPreamble vs runWithAST) in dynamic index. They might use same facilities, but we could always share them. Comment Actions
Hmm, I think this is the same for other symbol payload e.g. definition can be missing for a symbol. And it seems to me that the concern is on the SymbolSlab level: if a slab is for a single TU, users should expect missing information; if a slab is merged from all TUs, then users can expect "complete" information. I think it's reasonable to assume that users of SymbolSlab are aware of this. I think it's probably not worth the overhead of maintaining and using two separate slabs.
Although the existing SymbolCollector supports different options, I think it still has a pretty well-defined responsibility: gather information about symbols. IMO, cross-reference is one of the property of symbol, and I don't see strong reasons to keep them separated.
I think these options are reasonable if they turn out to be necessary. And making the SymbolCollector more complicated doesn't seem to be a problem if we are indeed doing more complicated work, but I don't think this would turn into a big problem as logic of xrefs seems pretty isolated. Conversely, I think implementing xrefs in a separate class would likely to cause more duplicate and maintenance, e.g. two sets of options, two sets of initializations or life-time tracking of collectors (they look a lot alike), the same boilerplate factory code in tests, passing around two collectors in user code.
With some options, this should be a problem I think? Comment Actions
I think it's reasonable to keep occurrences away from Symbol's Detail field. Stashing them together is only fine for the collector API, having any way to directly access occurrences through Symbol will be totally confusing for all the other users. On the other hand, SymbolSlab feels like a perfectly reasonable place to store the occurrences in addition to the symbols themselves and it feels we should reuse its memory arena for storing any strings we need to allocate, etc.
+1 to merging into the SymbolCollector. Keeping the responsibilities separate inside a single class should be easy, e.g. something like that should be simple enough: SymbolCollector::handleDeclOccurence(args) { this->processForSymbol(args); // handles keeping the Symbol structure up-to-date, i.e. adds definition locations, etc. this->processForOccurrences(args); // appends occurrences to a list of xrefs. }; The main advantage that we get is less clang-specific boilerplate. The less IndexDataConsumers, FrontendActionFactorys, FrontendActions we create, the more focused and concise our code is. Comment Actions
My concerns of storing occurrences as an extra payload of Symbol are:
I think they are necessary. For collecting all occurrences for local symbols from the AST, we only need symbol occurrence information, other information (e.g. declaration&definition location, #include) should be discarded; Index for code completion should not collect symbol occurrences.
If xrefs is quite isolated, I think it is a good signal to have a dedicated class handling it.
Merging xrefs to SymbolCollector couldn't avoid these problems, I think it is a matter of where we put these code:
The duplication is mainly about AST frontend action boilerplate code. To eliminate it, we could do some refactorings:
Comment Actions Update the patch based on our offline discussion
hokein marked 7 inline comments as done. Comment ActionsUpdate the patch based on our new discussion
hokein retitled this revision from [clangd] Collect symbol occurrences from AST. to [clangd] Collect symbol occurrences in SymbolCollector.Aug 26 2018, 10:42 PM Comment Actions This looks pretty good!
This revision is now accepted and ready to land.Aug 31 2018, 5:35 AM
Revision Contents
Diff 161962 clangd/index/FileIndex.cpp
clangd/index/Index.h
clangd/index/Index.cpp
clangd/index/SymbolCollector.h
clangd/index/SymbolCollector.cpp
unittests/clangd/SymbolCollectorTests.cpp
|
NIT: having friend decls inside the classes themselves might prove to be more readable.
Not opposed to the current one too, feel free to ignore.