This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Introduce ProjectAwareIndex
ClosedPublic

Authored by kadircet on Nov 4 2020, 2:21 AM.

Details

Summary

An index implementation that can dispatch to a variety of indexes
depending on the file path. Enables clangd to work with multiple indexes in the
same instance, configured via config files.

Depends on D90749, D90746

Diff Detail

Event Timeline

kadircet created this revision.Nov 4 2020, 2:21 AM
kadircet requested review of this revision.Nov 4 2020, 2:21 AM
kadircet planned changes to this revision.Nov 4 2020, 2:22 AM

This is a WIP patch. Provided as a way to demonstrate how ExternalBlock will be used in an index implementation.

kadircet updated this revision to Diff 303442.Nov 6 2020, 6:59 AM

The index has the following query semantics:

  • Lookup: it only queries the associated index. This is to not regress

latency for operations like Hover and Go-To.

  • FuzzyFind only queries the associated index when

RestrictForCodeCompletion is set, it queries all otherwise. Again this
is to prevent latency regression for code completion, but make sure we
provide complete results for search operations like WorkspaceSymbol.

  • Refs, queries all. As incomplete results for such operations might

result in bad UX for opeartions like rename. Also having an incomplete
set of references might result in bad impressions.

  • Relations, queries all. Again this is used by operations like type and

call hierarchy and having incomplete results are useless for such
operations.

The index has the following query semantics:

Rehashing offline conversation:

  • These heuristics mostly amount to guessing which high-level operation is being performed. It's a non-obvious place to put this logic, and it's hard to fix without fallout when the guess is wrong.
  • These guesses will get stale. e.g. this queries all indexes for relations, but one index for lookup. We'd like to use both for Hover soon.

Our conclusion was we could query one index by default, and allow callers to override this using a context variable.
The arguments for a context variable vs attributes on request:

  • it naturally applies to batches of requests
  • it's a less invasive change
  • it "hides" the option from other indexes

I'm not that sure about this implementation choice, but it also probably doesn't matter much.

This looks really sensible to me! Nice you managed to reuse Memoize, there's very little ugly caching/locking left.

clang-tools-extra/clangd/index/ProjectAware.cpp
82

I think the cache key needs a bit more consideration.

It's unlikely that a filename will collide with a host/port, but they are currently sitting in the same namespace.

More worryingly, the mountpoint is used in the lambda body but not in the cache key - are we sure this is safe?

The clearest way to enforce this is to make index creation use the cache key only, and not other data from the config. Then you have to include what you're going to use.

82

as discussed offline, allowing the config -> cache key and cache key -> index operations to be injectable is the key to testing I think

101

from memoize:

Concurrent calls for the same key may run the
computation multiple times, but each call will return the same result.

so this side-effect may run multiple times too, resulting in multiple index copies getting leaked. This seems like it may be bad in the case of file indexes.

Why can't the memoize map own the indexes?
This would still result in loading the index multiple times and throwing things away, hmm. Not sure how best to resolve this.

120

consider moving this to unique_ptr<MergedIndex> MergedIndex::create(ArrayRef<unique_ptr<SymbolIndex>>) or so?

Not sure if this is something we'd want to use from ClangdMain...

129

with the helper described above, the Primary ordering could be maybe better expressed as std::stable_partition

clang-tools-extra/clangd/index/ProjectAware.h
31

Is there a reason this class needs to be public?
I think we can expose factories returning SymbolIndex only

kadircet updated this revision to Diff 305695.Nov 17 2020, 1:25 AM
kadircet marked 5 inline comments as done.
  • Address comments
  • Only query the associated index
  • Use ExternalIndexSpec as the cache key
clang-tools-extra/clangd/index/ProjectAware.cpp
101

so this side-effect may run multiple times too, resulting in multiple index copies getting leaked. This seems like it may be bad in the case of file indexes.

The only way i can see this happening is by synchronizing calls to getIndex.

Why can't the memoize map own the indexes?

Main problem is, it returns a copy to the stored elements, rather than a reference to them. Which might be necessary in general to guarantee thread-safety. Since we can't copy unique_ptrs, I had to store them elsewhere and only keep pointers to it within the memoize map.

So both things considered, maybe we should just give up on memoize and have our own thread-safe cache here?

sammccall added inline comments.Nov 19 2020, 5:20 AM
clang-tools-extra/clangd/Config.h
88 ↗(On Diff #305695)

Maybe we could use DenseMap instead? Defining an order on specs seems semantically iffy.

(Maybe one day we'll get to use a better hashtable and DenseMapInfo won't be so gross)

clang-tools-extra/clangd/index/ProjectAware.cpp
101

Yeah given that we know the compute function is going to be fast, we can just run it under the lock. So let's just use a big fat mutex.

138

nit: I think this would be clearer as a named function instead of a lambda

clang-tools-extra/clangd/index/ProjectAware.h
22

nit: "factory" is more common than "generator" for this purpose, probably

26

Mention the default?

The default implementation loads the index asynchronously on the AsyncTaskRunner.
The index will appear empty until loaded.

kadircet updated this revision to Diff 306615.Nov 20 2020, 12:59 AM
kadircet marked 6 inline comments as done.
  • Internalize synchronization of indexstorage rather than using Memoize
sammccall accepted this revision.Nov 20 2020, 3:02 AM
sammccall added inline comments.
clang-tools-extra/clangd/Config.h
95 ↗(On Diff #306615)

(nit: static isn't saving anything here I think: cost of constructing == cost of copying)

This revision is now accepted and ready to land.Nov 20 2020, 3:02 AM
kadircet updated this revision to Diff 306924.Nov 22 2020, 9:11 AM
kadircet marked an inline comment as done.
  • Address comments
This revision was landed with ongoing or failed builds.Nov 22 2020, 12:13 PM
This revision was automatically updated to reflect the committed changes.