This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Implement findOccurrences interface in dynamic index.
ClosedPublic

Authored by hokein on Aug 26 2018, 10:31 PM.

Diff Detail

Event Timeline

hokein created this revision.Aug 26 2018, 10:31 PM

Some numbers of memory usage from running this on my machine:

FilePreamble ASTMain AST
SemaDecl.cpp14.5MB108.1KB (symbols) + 16.5KB (occurrences)
CodeComplete.cpp15.4MB53.9KB (symbols) + 7.3KB (occurrences)

The memory usage of occurrences is relatively small, I think it is fine to enable it by default.

The memory usage looks good. Some NITs and a major consideration wrt to the implementation of merging occurences from dynamic and static index.

clangd/index/FileIndex.cpp
48

There is an implicit assumption (TopLevelDecls != null iff indexing the main file AST in dynamic index) that does not seem quite obvious.
Maybe add two overloads (for indexing the main AST and for indexing the preamble AST) or add an extra parameter to be explicit more about those choices?

80

NIT: maybe remove braces around single-statement branches?

145–149

Why not std::tie(*Slab, *OccurenceSlab) = indexAST(...)?
I assume this should work and give the optimal performance

clangd/index/FileIndex.h
43

Maybe avoid default arguments? Having clients pass nullptr explicitly seems like the right thing to do

105

Maybe use a struct to allow named fields? Would mean a tiny amount of extra code in indexAST, but should make the client code slightly nicer.

clangd/index/MemIndex.h
24

Maybe remove the default parameter? Making the callers specify the occurrences explicitly could make their code more straight-forward to follow.

clangd/index/Merge.cpp
88

Do we want to fix it before checking this in? As discussed offline, this seems to be a limitation that could affect the design (FileOccurrences vs SymbolOccurrence in the symbol interface). In case we want to avoid refactoring the interfaces later.

95

llvm::is_contained seems to be linear time, maybe replace with hash-table lookup.
Or am I missing something?

Just noticed I'm not on the reviewers list, sorry for chiming in without invitation. Was really excited about the change :-)

Just noticed I'm not on the reviewers list, sorry for chiming in without invitation. Was really excited about the change :-)

Comments are always welcome :)

clangd/index/Merge.cpp
88

Yes, I re-thought it afterwards. We could solve this issue for findOccurrences by redefining the interface.

Unfortunately, we have the same issue for fuzzyFind, lookup too -- we still return the stale symbols from static index if the symbol was deleted in dirty files. I think we should address this issue thoroughly and separately -- we probably want to add a method in dynamic index for query whether a file is in.

Just noticed I'm not on the reviewers list, sorry for chiming in without invitation. Was really excited about the change :-)

fixed :-)

clangd/index/FileIndex.cpp
59–70

combine into one log statement

clangd/index/FileIndex.h
51

This return type seems inconsistent with the design of this class.
The purpose of the shared_ptr<vector<symbol_slab>> in allSymbols is to avoid exposing the concrete storage.
The analogue would I guess be shared_ptr<DenseMap<SymbolID, vector<SymbolOccurrence*>>>.

The cost of creating that does seem a little gross though.

51

allOccurrences
(name should reflect the semantics, not the type)

59

nit: no need for brief unless the "first sentence" heuristic fails. (also, it's misspelled!)

100

update doc

clangd/index/MemIndex.cpp
58–59

we should find a way to avoid having functions implicitly zero out parts of the index like this!

109

This assumes there is no duplication across occurrences, but gives no mechanism through which deduplication could occur (because we're coupled to the underlying storage).

If FileIndex provides the DenseMap<SymbolID, vector<SymbolOccurrence*>> then it could just refrain from providing duplicate pointers.

clangd/index/MemIndex.h
24

(type and name: same comment as on FileIndex)

clangd/index/Merge.cpp
84–103

I'm tempted to say remove this heuristic for now (as you say, we should solve this thoroughly), but the artifacts will be severe, so it's probably correct to include it.

But this comment doesn't explain the heuristic, it just echoes the code.
I'd suggest something like:

// We don't want duplicate occurrences from the static/dynamic indexes,
// and we can't reliably deduplicate because offsets may differ slightly.
// We consider the dynamic index authoritative and report all its occurrences,
// and only report static index occurrences from other files.
// FIXME: this heuristic fails if the dynamic index contains a file, but all
// occurrences were removed (we will report stale ones from static).
// Ultimately we should explicitly check which index has the file instead.
85

SeenFiles -> DynamicIndexFiles?

88

Yes. Please document this limitation on the MergedIndex class too since it applies to everything. It should still be documented here because it's a limitation of the particular heuristic we're using here.

hokein updated this revision to Diff 163064.Aug 29 2018, 6:00 AM
hokein marked 16 inline comments as done.

Address review comments.

hokein updated this revision to Diff 163066.Aug 29 2018, 6:01 AM

Minor cleanup.

Harbormaster completed remote builds in B22053: Diff 163066.

Thanks for the comments.

clangd/index/FileIndex.cpp
48

This is a hacky way to detect whether we are indexing main AST or preamble AST -- because the current FileIndex::update(PathRef Path, ...) interface doesn't contain this information, and we use it both for PreambleAST and MainAST indexing in the client side (ParsingCallback).

I think we might want two dedicated methods (updateMainAST, and updatePreambleAST) in FileSymbol, and I think we should address it in a follow-up patch. WDYT?

clangd/index/FileIndex.h
51

The cost of creating that does seem a little gross though.

Previously I tried to avoid the cost of iterating/copying all occurrences. After some investigations, it seems that the number of symbol occurrences is small (~500 for SemaDecl.cpp, < 100 for CodeComplete.cpp). The cost should not be too expensive.

105

I think the type of pair explicitly express its meaning well enough? And this function is exposed for unittests, I'm happy to do the change if you insist.

clangd/index/MemIndex.cpp
58–59

Made it explicit.

clangd/index/Merge.cpp
95

Good catch, I thought it had a trait for Map, Set, but it turns out that is_contained is only a std::find wrapper :(

ilya-biryukov added inline comments.Aug 29 2018, 6:28 AM
clangd/index/FileIndex.cpp
48

Either a dedicated methods or a separate parameter LG.
And it's probably ok to do it with a follow-up change if you want to keep the scope of this patch smaller. That should be a pretty small change overall, so it should be fine either way.

clangd/index/FileIndex.h
105

Not really, that was just a suggestion. Feel free to ignore if the current version looks better to you.

sammccall accepted this revision.Aug 31 2018, 3:48 AM

This basically looks good to go (some fixes needed but they're pretty clear I think let me know if not!)

clangd/index/FileIndex.cpp
61

for this log to be useful, you might want to include the filename.
You can get it from the sourcemanager on the ASTContext.

clangd/index/MemIndex.cpp
55–56

This is still implicitly creating an index with no occurrences. Did you mean to accept a SymbolOccurrencesSlab here?

clangd/index/MemIndex.h
22–23

can you add an explanation to this?

clangd/index/Merge.cpp
94

Unfortunately this is not safe, the backing strings may not live long enough.
This should be a StringSet (by value) instead.

100

or just DynamicIndexFileURIs.count(O.Location.FileURI) and rely on the implicit conversion to bool

clangd/index/SymbolCollector.cpp
227 ↗(On Diff #163066)

toOccurrenceKind

isn't this just

return Roles & AllOccurrenceKinds with some casts?
(I'd put AllOccurrenceKinds into the header, but it could also be a constant here)

clangd/index/SymbolCollector.h
120 ↗(On Diff #163066)

move these up next to ReferencedDecls and ReferencedMacros so the comment applies to them too?

Similarly, move SymbolOccurrenceSlab next to the SymbolSlab::Builder? These have strong parallels.

122 ↗(On Diff #163066)

I'm not sure this works - IIRC can use a SymbolCollector for multiple TUs, with finish() called at the end of each one.
I think you need to (incrementally) build in finish(), and freeze in takeOccurrences().

unittests/clangd/TestTU.h
41 ↗(On Diff #163066)

We've avoided adding this in the past because it's less readable. Please assign the fields separately instead.

This revision is now accepted and ready to land.Aug 31 2018, 3:48 AM
hokein marked 3 inline comments as done.Aug 31 2018, 5:29 AM

Sorry! Just realised I messed up this patch with D50385 (mostly SymbolCollector changes), all the comments about SymbolCollector are fixed in D50385.

hokein updated this revision to Diff 163531.Aug 31 2018, 7:23 AM
hokein marked 4 inline comments as done.
  • rebase
  • address review comments
hokein updated this revision to Diff 163532.Aug 31 2018, 7:27 AM

Minor cleanup.

clangd/index/MemIndex.cpp
55–56

This is intended, this function only cares about symbol, no occurrences.

unittests/clangd/TestTU.h
41 ↗(On Diff #163066)

I'm tempted to keep it here, while it's less readable, it does more things, and can make client code more readable (see newly-added tests), otherwise we have to repeat these statements in a few places.

sammccall added inline comments.Aug 31 2018, 7:53 AM
unittests/clangd/TestTU.h
41 ↗(On Diff #163066)

That's what I mean, I find the newly-added tests hard to read, because "allcode" isn't clear (particularly when one of the params is a filename) and the order of parameters is not obvious.
I think they would be much clearer with the fields assigned individually as the existing tests do.

hokein updated this revision to Diff 163539.Aug 31 2018, 8:37 AM
hokein marked 3 inline comments as done.

address review comments:

  • build memindex with symbol slab and occurrence slab
  • remove withAllCode in TestTU
This revision was automatically updated to reflect the committed changes.