The implementation is very close the the incoming
calls implementation. The results of the outgoing
calls are expected to be the exact symmetry of the
Thanks for working on this!
I haven't looked at the whole patch in detail, but I looked at some parts (mainly the SymbolIndex API and what in-memory structures we store to implement it) and wrote a few thoughts.
If memory usage of Dex::RevRefs becomes a concern, we could consider only storing the references that would pass this filter in the first place. That would trade off time spent building the reverse lookup (we'd have to do symbol lookups or keep around extra state to track the symbol kinds) for space savings.
As the protocol wants the outgoing calls grouped by symbol, we could consider storing them (and having the SymbolIndex API expose them) in that way.
The main motivation would be space savings on the storage side (Dex::RevRefs), as in the case of multiple calls to the same function we only need to store the target SymbolID once.
However, it may not be worth doing this, as having a large number of outgoing calls to the same target inside a given function is likely to be rare, and vectors have their own overhead. (It's also not clear to what extent the details of the LSP protocol should dictate the shape of the SymbolIndex API.)
Perhaps we should have a distinct request structure, for future extensibility?
Looping over all refs seems expensive even for MemIndex. Perhaps we should have a reverse lookup table similar to RevRefs here as well?
I've used tried this patch with the whole llvm project indexed. If I remember correctly it's something in the like of 100MB, around 5-10% même usage of clangd. As such it's not very high but it's definitely not negligible, especially for a feature that's not used very often.
I indeed believe that I'd use more memory than it would save, but I did not try it.
Sure, that makes sense
That's pretty fast even on my laptop IIRC, and only used for outgoing calls atm.
Thanks for looking into this feature!
I do have to ask a fairly naive question though - what's it useful for, beyond protocol completeness? The obvious alternative to a user is looking at a function's implementation, which is an extra context switch but also provides a lot more information. I'm having trouble understanding when this could be better enough to justify much extra complexity/resources.
(Adding a new method to the index interface seems like a fairly substantial thing, and 5-10% to index memory usage is substantial as you say).
I think that, as with incoming calls, the incremental value is in seeing the tree at a glance. So, you might query the outgoing calls for a function, expand the tree, and look over the transitive callees to see if the function ends up performing a certain type of operation (say, a filesystem access).
That said, I will admit that in ~10 years of using Eclipse CDT, which supports both incoming calls and outgoing calls, I can't recall using outgoing calls even once (whereas I did use incoming calls). Of course, other users' experiences may vary.
You're right, I probably got carried away for the sake of completeness. Allocating extra memory for a feature nobody is going to use is definitely not worth it. Though maybe we can take advantage of what's already been done and if we remove the RevRefs map, and pay the full price of iterating all refs when refersTo is called, we still have this feature. Although slower, it shouldn't take more than a few 100's of milliseconds even on a big project. Which means the feature is usable, at least for most small to medium sized projects, at no extra cost for people who do not use it.
Would this be acceptable ?
Time to wake up an old review!
Going to put high-level thoughts on https://github.com/clangd/clangd/discussions/1206, but since one of them is memory usage I wanted to have a look at the concrete data structure here.
TL;DR: I think we can make this cheap enough we don't care about the extra memory. The simple lookup optimization below might be enough (probably not). Dropping non-callee refs from the data structure is likely enough, the deep lookup optimization is likely enough, doing both probably reduces memory usage vs today!
Lookup optimization (simple):
DenseMap<SymbolID, vector<RevRef>> is something like 56*containers + 16*refs bytes:
- a `pair<SymbolID, vector<RevRef>, which is 4 pointers per symbol, plus extra buckets so >7 pointers if I'm reading DenseMap right
- a RevRef per entry which is two pointers
- maybe some allocator overhead for all those little vectors? (IDK how allocators work in practice)
Dropping the hashtable and just storing a flat vector<RevRef> sorted by container eliminates the per-symbol costs for 16*refs bytes. If there were only 3-4 refs per function this would save 50%, but there's probably more. The tradeoff is binary search (with double-indirection for compare) instead of hash lookup, but still seems fast enough for what we want it for.
The elephant in the room is that most of these refs we're storing are not callees. If we were to drop those from the lookup structure as discussed inline above, then the ratio of refs:symbols goes down and this optimization becomes relatively more effective.
Lookup optimization (deeper): h/t @kadircet
I think it's easiest to explain as a pair of modifications to the current refs data structures (as checked-in, not the ones in this patch).
First: imagine we change RefSlab to produce:
- a big flat vector<Ref> grouped by target.
- a lookup structure consisting of a sorted vector<pair</*Target*/SymbolID, /*Offset*/int>>
This is smaller than our current representation: the lookup structure is 20 * targets bytes instead of the current ~40 * targets bytes and making the ref storage contiguous is ~free.
Clearly this supports lookup of refs for a target by binary searching over the lookup structure by target.
But observe that also given a Ref* we can find its target: binary search over the lookup structure searching for the right offset, rather than target.
Second: have the index compute a permutation of refs grouped by Container. Naively this is a vector<Ref*>, but since we made the array flat it can be vector<int>
Now we can look up the refs for a given container by binary searching within this permutation, as the container is stored in the Refs themselves.
For each result we have to do a second binary search in the lookup structure to get the target symbol, but we expect the #symbols per container to be small so this nesting isn't a great concern.
So the net cost vs HEAD is 4 * refs - 20 * targets bytes, which is less than a quarter of the version above. This takes us to ~2% increase which we can eat even if we store this info for all references.
The costs here are expensive reverse lookups (I don't think we care), slightly more expensive forward lookups (also OK I think), code complexity (*mostly* localized). Obviously this is all somewhat invasive...
(We discussed an even further squeeze: we could use the same trick to avoid our current 8 bytes inside the ref for the container. I think this leads to too many levels of nested binary searches)
RefKind is a bitfield, so I think we could just tack on an extra Call bit at indexing time. Technically it's extra state, but it's cheap to compute and free to plumb around.
Then the tradeoff becomes: the index needs to know which refkinds to include in its reverse-lookup structure. In practical terms we could hard-code this, it's an ugly layering violation though. The prettiest version I can think of is to put a constant on RefersToRequest for the supported refkinds.
(A ridiculous answer that almost works is to build it dynamically on first use, based on the queried refkinds: this means only people who use outgoing call hierarchy would pay the memory cost! Unfortunately I think this gets too complicated when we consider threadsafety)
+1, also for clarity (we're changing the meaning of IDs too).
I'd also consider naming this request something more explicitly paired with refs, like containedRefs, reverseRefs, outgoingRefs. They're not great names in isolation, but I think the idea that this is a second query on the same data is worth emphasizing.
MemIndex basically doesn't scale: it does fuzzyfind by iterating over all symbols. So this sounds likely never to matter in practice.
I'm interested but I don't have enough free time these days. So if anyone wants to take it from there, feel free to reuse my commits (or not).
If I find myself with some free time and nobody else is working on this, I'll give it a shot
To be able to reason about the impact of various memory usage optimizations, I took some baseline measurements.
I used a workload consisting of the LLVM codebase, with the compile_commands.json entries filtered to those containing clang/lib or clang-tools-extra/clangd, a total of 1082 source files. (This is what I happen to use for local clangd development, but I also think it strikes a good balance between being large enough to be representative but not so large that taking local measurements is too much of a hassle.)
I measured the memory usage of the background index by performing the clangd: show memory usage command.
Without this patch (baseline): background_index 560MB (index 372MB, slabs 187MB)
With the patch in its current state: background_index 606MB (index 418MB, slabs 187MB)
This is an increase of (606 - 560) / 560 = 8.2% over baseline, which is consistent with previous reports of 5-10%.
My plan going forward is to implement some of the suggested memory usage optimizations, and measure their impact.
The updated patch implements one of the optimizations discussed during review, namely filtering the Refs stored in the RevRefs data structure to just those that could be calls.
To this end, the patch introduces a new RefKind, RefKind::Call, as discussed.
With the updated patch, memory usage on the same workload is: background_index 579MB (index 392MB, slabs 187MB).
This is an increase of (579 - 560) / 560 = 3.4% over baseline (down from 8.2% with the original patch!)
The updated patch additionally implements the "simple lookup optimization" discussed in the review.
With this version, memory usage on the test workload is: background_index 574MB (index 387MB, slabs 187MB).
This is an increase of (574 - 560) / 560 = 2.5% over baseline (down from 8.2% with the original patch)
@sammccall, how would you feel about proceeding with the patch in its current state, with the memory usage increase brought down from 8.2% to 2.5% thanks to the combination of the simple lookup optimization + RefKind filtering, and leaving the "deep lookup optimization" to be explored in a future change?