The index has the following query semantics:
- Lookup: it only queries the associated index. This is to not regress
latency for operations like Hover and Go-To.
- FuzzyFind only queries the associated index when
RestrictForCodeCompletion is set, it queries all otherwise. Again this
is to prevent latency regression for code completion, but make sure we
provide complete results for search operations like WorkspaceSymbol.
- Refs, queries all. As incomplete results for such operations might
result in bad UX for opeartions like rename. Also having an incomplete
set of references might result in bad impressions.
- Relations, queries all. Again this is used by operations like type and
call hierarchy and having incomplete results are useless for such
Rehashing offline conversation:
- These heuristics mostly amount to guessing which high-level operation is being performed. It's a non-obvious place to put this logic, and it's hard to fix without fallout when the guess is wrong.
- These guesses will get stale. e.g. this queries all indexes for relations, but one index for lookup. We'd like to use both for Hover soon.
Our conclusion was we could query one index by default, and allow callers to override this using a context variable.
The arguments for a context variable vs attributes on request:
- it naturally applies to batches of requests
- it's a less invasive change
- it "hides" the option from other indexes
I'm not that sure about this implementation choice, but it also probably doesn't matter much.
This looks really sensible to me! Nice you managed to reuse Memoize, there's very little ugly caching/locking left.
I think the cache key needs a bit more consideration.
It's unlikely that a filename will collide with a host/port, but they are currently sitting in the same namespace.
More worryingly, the mountpoint is used in the lambda body but not in the cache key - are we sure this is safe?
The clearest way to enforce this is to make index creation use the cache key only, and not other data from the config. Then you have to include what you're going to use.
as discussed offline, allowing the config -> cache key and cache key -> index operations to be injectable is the key to testing I think
so this side-effect may run multiple times too, resulting in multiple index copies getting leaked. This seems like it may be bad in the case of file indexes.
Why can't the memoize map own the indexes?
consider moving this to unique_ptr<MergedIndex> MergedIndex::create(ArrayRef<unique_ptr<SymbolIndex>>) or so?
Not sure if this is something we'd want to use from ClangdMain...
with the helper described above, the Primary ordering could be maybe better expressed as std::stable_partition
Is there a reason this class needs to be public?
- Address comments
- Only query the associated index
- Use ExternalIndexSpec as the cache key
The only way i can see this happening is by synchronizing calls to getIndex.
Main problem is, it returns a copy to the stored elements, rather than a reference to them. Which might be necessary in general to guarantee thread-safety. Since we can't copy unique_ptrs, I had to store them elsewhere and only keep pointers to it within the memoize map.
So both things considered, maybe we should just give up on memoize and have our own thread-safe cache here?
Maybe we could use DenseMap instead? Defining an order on specs seems semantically iffy.
(Maybe one day we'll get to use a better hashtable and DenseMapInfo won't be so gross)
Yeah given that we know the compute function is going to be fast, we can just run it under the lock. So let's just use a big fat mutex.
nit: I think this would be clearer as a named function instead of a lambda
nit: "factory" is more common than "generator" for this purpose, probably
Mention the default?
The default implementation loads the index asynchronously on the AsyncTaskRunner.