This reduces memory usage by dynamic index from more than 400MB to 32MB
when all files in clang-tools-extra/clangd/*.cpp are active in clangd.
Details
- Reviewers
sammccall - Commits
- rGdffa9dfbda56: [clangd] Shard preamble symbols in dynamic index
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Can't believe we didn't do this before. Nice catch.
clang-tools-extra/clangd/index/FileIndex.cpp | ||
---|---|---|
103 | This screams out to be shared with BackgroundIndex::update(). Is it very hard to do? Things that are different:
I imagine factoring out a function that returns a StringMap<File> like in BackgroundIndex::Update, but File has a build() function that returns an IndexFileIn or so. | |
118 | um... I guess that works, because Expected is unchecked in NDEBUG? | |
422 | nit: main-file | |
clang-tools-extra/clangd/index/FileIndex.h | ||
129–130 | This seems like a caveat rather than a FIXME - we're not really proposing a fix are we? |
clang-tools-extra/clangd/index/FileIndex.cpp | ||
---|---|---|
103 | In backgroundindex we "prefilter" symbols and only duplicate the ones coming from modified files, whereas in this one we duplicate every symbol no matter what. I didn't want to change backgroundindex into post filtering mode (I suppose that's what you mean by the last item). as for the first bullet point, duplicating in case of preamble symbols wouldn't matter since we don't merge(rather pick one) while building the index for preamblesymbols. I suppose storing them twice shouldn't be a huge issue. Depending to IndexFileIn in "Index.h" would be a violation, but depending on it in "FileIndex.h" is fine. I am happy to refactor a StringMap<File> shardIndexToFiles(IndexFileIn, HintPath) into "FileIndex.h" that can be used by both backgroundindex and this, if you are OK with above mentioned regressions. |
- Unify sharding logic in BackgroundIndex and FileIndex.
- Make sure relations have valid subjects.
clang-tools-extra/clangd/index/FileIndex.cpp | ||
---|---|---|
103 | implemented a version of this that only copies data when requested. Hence the pre/post filtering argument is gone. |
clang-tools-extra/clangd/index/SymbolCollector.cpp | ||
---|---|---|
286 | put this before processRelations to ensure Subject of a relation is a symbol we are going to collect. |
LG, API quibbles.
It would be nice to test this but the good things about it (memory usage) aren't easy to test, and the bad things (sensitivity to header parsing context) seem odd to be the primary test.
Actually I guess one nice thing we get from this is erasure of stale symbols: if we index the preamble {foo.cpp=#include "x.h", x.h=int a;}, and then {bar.cpp=#include "x.h", x.h=int b;}, then I think before this patch we see both a and b in the index, and after we see only b.
That would be a reasonable test.
Or I guess now that the sharding function is public we can test it directly.
clang-tools-extra/clangd/index/FileIndex.h | ||
---|---|---|
129–130 | nit: header | |
157 | nit: space before parenthetical | |
159 | Hmm, this is a good name for a function, but we want a noun for a class. FileShardedIndex? | |
165 | I do find it a little weird not to expose the map-structure of the vast majority of the data here. What steers you away from just making this a function | |
172 | Can we bundle these parallel functions into one that returns IndexFileIn? That would also mean no need to special-case mainfilecmd in the API - the only caller of that function does it in the file loop, so it would be provided for the main-file shard only. | |
clang-tools-extra/clangd/index/SymbolCollector.cpp | ||
286 | The code around this seems unchanged - how does this relate to the rest of the patch? This is bit subtle, and hard to reason about since we only have one example. Indeed dangling relations are pointless. However:
I think probably we should give this a comment (about not wanting relations or refs either), add filtering on the relation objects, and add a test (I think that should be pretty easy with inheriting from a main-file-only class?). |
- Address comments
- Merge slab generations into a single member that returns an IndexFileIn.
clang-tools-extra/clangd/index/FileIndex.h | ||
---|---|---|
165 | because the map structure is just pointers into the IndexFileIn, I wanted to make sure those pointers do not outlive the IndexFileIn. | |
clang-tools-extra/clangd/index/SymbolCollector.cpp | ||
286 |
Selection of file declaring the Subject seemed arbitrary, and turned out to be broken. We were not hitting this case before because background index was performing pre filtering and skipping any relation that mapped to a "non interesting" file, hence we didn't notice it was broken. Now it become obvious as we assert on the file containing the subject symbol.
I also wanted to do this but it got hairy and as you also have suggested I decided to have a separate patch for this. Because there were other problematic stuff, for example IndexerOpts::CollectMainFileSymbols is always true. |
Still LG
clang-tools-extra/clangd/index/FileIndex.h | ||
---|---|---|
165 | I don't think it's unreasonable to have a non-owning data structure with a comment, but up to you. |
This seems like a caveat rather than a FIXME - we're not really proposing a fix are we?