This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Store preamble macros in dynamic index.
ClosedPublic

Authored by ioeric on Sep 14 2018, 1:19 AM.

Details

Summary

Pros:
o Loading macros from preamble for every completion is slow (see profile).
o Calculating macro USR is also slow (see profile).
o Sema can provide a lot of macro completion results (e.g. when filter is empty,
60k for some large TUs!).

Cons:
o Slight memory increase in dynamic index (~1%).
o Some extra work during preamble build (should be fine as preamble build and
indexAST is way slower).

Before:

After:

Event Timeline

ioeric created this revision.Sep 14 2018, 1:19 AM
ioeric updated this revision to Diff 165444.Sep 14 2018, 1:29 AM
  • minor cleanup
ioeric edited the summary of this revision. (Show Details)Sep 14 2018, 1:33 AM

~1% increase in memory usage seems totally fine. Actually surprised it's that small.
Overall LG, just a single comment about extra copying. Thanks for the change, looks like a great improvement!

clangd/index/FileIndex.cpp
64

Why make an extra copy of the symbols? Why not add macros to the same builder used in collector?

ioeric added inline comments.Sep 14 2018, 4:37 AM
clangd/index/FileIndex.cpp
64

index::indexTopLevelDecls will re-initialize the collector. This is safe with the current implementation, but I can imagine it goes wrong when we do more cleanup in the initialization.

ilya-biryukov added inline comments.Sep 14 2018, 5:00 AM
clangd/index/FileIndex.cpp
64

Sorry, missed the comment about it.
And if we do this after indexTopLevelDecls, than we'll be calling after the finish call. Sigh...

Doing an extra copy of the symbols here is wasteful and makes the code more complicated than it should be.

We have alternatives that could fix this:

  1. Move IndexMacro logic into index::indexTopLevelDecls
  2. Move macro collection to SymbolCollector::finish()
  3. Extract a function that creates a symbol for macro definition, so we don't need to call handleMacroOccurence and add can add extra macro symbols after SymbolCollector finishes.

(1) seems to be the cleanest option, WDYT?

ioeric updated this revision to Diff 165517.Sep 14 2018, 8:26 AM
  • Move macro collection to indexTopLevelDecls.

~1% increase in memory usage seems totally fine. Actually surprised it's that small.

Tested on a larger file: it's ~5% for ClangdServer.cpp. I think it's still worth it for the speedup.

clangd/index/FileIndex.cpp
64

(1) sounds good. D52098

sammccall accepted this revision.Sep 14 2018, 4:04 PM

This is definitely the right thing to do, thanks for finding it!
I've got a long comment about how everything used to be simpler in the good old days :-) I'm itching to refactor a bit, but land this first.

clangd/index/FileIndex.cpp
36

I don't understand this change. Don't we also want this only if IndexMacros is true?
(It's possible it doesn't make a difference due to implementation details but it seems worth being clear here)

clangd/index/FileIndex.h
93 ↗(On Diff #165517)

I'm concerned about the proliferation of parameters here. (Not new with this patch, it's been a continuous process - the first version had one input and one output!)
It's heading in the direction of being a catch-all "collect some data from an AST" function, at which point each caller has to think hard about every option and we're going to end up with bugs.
(For example: TestTU::index() passes "false" for IndexMacros. It seems to me maybe it should be "true". But it's really hard to tell.)
That's also pretty similar to the mission of SymbolCollector itself, so we're going to get some confusion there.

As far as I can tell, there's now two types of indexing that we do:

  • preamble indexing: we look at all decls, and only index those outside the main file. We index macros. We don't collect references. Inputs are ASTContext and PP.
  • mainfile-style indexing: we look only at top-level decls in the main file. We don't index macros. We collect references from the main file. Inputs are ParsedAST.

This really looks like it should be 2 functions with 2 and 1 parameters, rather than 1 function with 4.
Then callers will have two styles of indexing (with names!) to choose between, rather than being required to design their own. And we can get rid of the "is this a main-file index?" hacks in the implementation.

Sorry for jumping on you here, this change isn't any worse than the three previous patches that added knobs to this function.
This doesn't need to be addressed in this patch - could be before or after, and I'm happy to take this on myself. But I think we shouldn't kick this can down the road too much further, eventually we end up with APIs like clang :-)

This revision is now accepted and ready to land.Sep 14 2018, 4:04 PM
ioeric added inline comments.Sep 14 2018, 11:34 PM
clangd/index/FileIndex.h
93 ↗(On Diff #165517)

I'm definitely on board with this and happy to do the refactoring before landing this patch, to break API just once ;) Just to clarify my understanding before doing that, do we also want to split FileIndex::update into updatePreamble and updateMain? And the new indexAST will be indexMain and indexPreamble?

ioeric updated this revision to Diff 165959.Sep 18 2018, 6:45 AM
  • merge with origin/master
sammccall accepted this revision.Sep 18 2018, 7:15 AM

Something not listed in cons: because macros aren't namespaced and we don't have lots of signals, they can be really spammy.
Potentially, offering macros that aren't in the TU could be a loss even if it's a win for other types of signals.

We could always e.g. postfilter index macro results using the include structure of the preamble, so no concern for this patch, just something to think about for the followup.

We also need to make sure that we're not indexing/serving header guards as code completions (e.g. if SemaCodeComplete is currently taking care of this)

clangd/index/FileIndex.cpp
48

nit: please clang-format

Something not listed in cons: because macros aren't namespaced and we don't have lots of signals, they can be really spammy.
Potentially, offering macros that aren't in the TU could be a loss even if it's a win for other types of signals.

Aren't they already spammy from Sema? Sema can provide thousands of macros in the TU.

We penalize quality of macro symbols in the global index. Maybe we can do the same thing for dynamic index?

We could always e.g. postfilter index macro results using the include structure of the preamble, so no concern for this patch, just something to think about for the followup.

Sounds good.

We also need to make sure that we're not indexing/serving header guards as code completions (e.g. if SemaCodeComplete is currently taking care of this)

These symbols are already filtered out in SymbolCollector.

This revision was automatically updated to reflect the committed changes.