Page MenuHomePhabricator

[clangd] Introduce ProjectAwareIndex

Authored by kadircet on Nov 4 2020, 2:21 AM.



An index implementation that can dispatch to a variety of indexes
depending on the file path. Enables clangd to work with multiple indexes in the
same instance, configured via config files.

Depends on D90749, D90746

Diff Detail

Event Timeline

kadircet created this revision.Nov 4 2020, 2:21 AM
kadircet requested review of this revision.Nov 4 2020, 2:21 AM
kadircet planned changes to this revision.Nov 4 2020, 2:22 AM

This is a WIP patch. Provided as a way to demonstrate how ExternalBlock will be used in an index implementation.

kadircet updated this revision to Diff 303442.Fri, Nov 6, 6:59 AM

The index has the following query semantics:

  • Lookup: it only queries the associated index. This is to not regress

latency for operations like Hover and Go-To.

  • FuzzyFind only queries the associated index when

RestrictForCodeCompletion is set, it queries all otherwise. Again this
is to prevent latency regression for code completion, but make sure we
provide complete results for search operations like WorkspaceSymbol.

  • Refs, queries all. As incomplete results for such operations might

result in bad UX for opeartions like rename. Also having an incomplete
set of references might result in bad impressions.

  • Relations, queries all. Again this is used by operations like type and

call hierarchy and having incomplete results are useless for such

The index has the following query semantics:

Rehashing offline conversation:

  • These heuristics mostly amount to guessing which high-level operation is being performed. It's a non-obvious place to put this logic, and it's hard to fix without fallout when the guess is wrong.
  • These guesses will get stale. e.g. this queries all indexes for relations, but one index for lookup. We'd like to use both for Hover soon.

Our conclusion was we could query one index by default, and allow callers to override this using a context variable.
The arguments for a context variable vs attributes on request:

  • it naturally applies to batches of requests
  • it's a less invasive change
  • it "hides" the option from other indexes

I'm not that sure about this implementation choice, but it also probably doesn't matter much.

This looks really sensible to me! Nice you managed to reuse Memoize, there's very little ugly caching/locking left.


I think the cache key needs a bit more consideration.

It's unlikely that a filename will collide with a host/port, but they are currently sitting in the same namespace.

More worryingly, the mountpoint is used in the lambda body but not in the cache key - are we sure this is safe?

The clearest way to enforce this is to make index creation use the cache key only, and not other data from the config. Then you have to include what you're going to use.


as discussed offline, allowing the config -> cache key and cache key -> index operations to be injectable is the key to testing I think


from memoize:

Concurrent calls for the same key may run the
computation multiple times, but each call will return the same result.

so this side-effect may run multiple times too, resulting in multiple index copies getting leaked. This seems like it may be bad in the case of file indexes.

Why can't the memoize map own the indexes?
This would still result in loading the index multiple times and throwing things away, hmm. Not sure how best to resolve this.


consider moving this to unique_ptr<MergedIndex> MergedIndex::create(ArrayRef<unique_ptr<SymbolIndex>>) or so?

Not sure if this is something we'd want to use from ClangdMain...


with the helper described above, the Primary ordering could be maybe better expressed as std::stable_partition


Is there a reason this class needs to be public?
I think we can expose factories returning SymbolIndex only

kadircet updated this revision to Diff 305695.Tue, Nov 17, 1:25 AM
kadircet marked 5 inline comments as done.
  • Address comments
  • Only query the associated index
  • Use ExternalIndexSpec as the cache key

so this side-effect may run multiple times too, resulting in multiple index copies getting leaked. This seems like it may be bad in the case of file indexes.

The only way i can see this happening is by synchronizing calls to getIndex.

Why can't the memoize map own the indexes?

Main problem is, it returns a copy to the stored elements, rather than a reference to them. Which might be necessary in general to guarantee thread-safety. Since we can't copy unique_ptrs, I had to store them elsewhere and only keep pointers to it within the memoize map.

So both things considered, maybe we should just give up on memoize and have our own thread-safe cache here?

sammccall added inline comments.Thu, Nov 19, 5:20 AM

Maybe we could use DenseMap instead? Defining an order on specs seems semantically iffy.

(Maybe one day we'll get to use a better hashtable and DenseMapInfo won't be so gross)


Yeah given that we know the compute function is going to be fast, we can just run it under the lock. So let's just use a big fat mutex.


nit: I think this would be clearer as a named function instead of a lambda


nit: "factory" is more common than "generator" for this purpose, probably


Mention the default?

The default implementation loads the index asynchronously on the AsyncTaskRunner.
The index will appear empty until loaded.

kadircet updated this revision to Diff 306615.Fri, Nov 20, 12:59 AM
kadircet marked 6 inline comments as done.
  • Internalize synchronization of indexstorage rather than using Memoize
sammccall accepted this revision.Fri, Nov 20, 3:02 AM
sammccall added inline comments.

(nit: static isn't saving anything here I think: cost of constructing == cost of copying)

This revision is now accepted and ready to land.Fri, Nov 20, 3:02 AM
kadircet updated this revision to Diff 306924.Sun, Nov 22, 9:11 AM
kadircet marked an inline comment as done.
  • Address comments
This revision was landed with ongoing or failed builds.Sun, Nov 22, 12:13 PM
This revision was automatically updated to reflect the committed changes.