This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Don't traverse the AST within uninteresting files during indexing
ClosedPublic

Authored by sammccall on May 20 2020, 7:05 AM.

Details

Summary

We already skip function bodies from these files while parsing, and drop symbols
found in them. However, traversing their ASTs still takes a substantial amount
of time.

Non-scientific benchmark on my machine:

background-indexing llvm-project (llvm+clang+clang-tools-extra), wall time
before: 7:46
after: 5:13
change: -33%

Indexer.cpp libclang should be updated too, I'm less familiar with that code,
and it's doing tricky things with the ShouldSkipFunctionBody callback, so it
needs to be done separately.

Diff Detail

Event Timeline

sammccall created this revision.May 20 2020, 7:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2020, 7:05 AM

@adamcz I think this is relevant to your interests maybe? :-)

kadircet accepted this revision.May 20 2020, 3:09 PM
kadircet added inline comments.
clang-tools-extra/clangd/unittests/IndexActionTests.cpp
22

nit using ::testing::EndsWith

was this add using tweak ?

261

maybe give the member a different name to distinguish it from the local var in function ?

clang/include/clang/Index/IndexingOptions.h
40

s/references that/references to that

clang/lib/Index/IndexingAction.cpp
142

are we happy with the copy here ?

I am a little uneasy, as these functions can preserve state. Currently the one used in here and the one used in IndexingContext::indexTopLevelDecl are different copies.

Maybe rather progate a null function into the IndexASTConsumer and create this default lambda in there, making use of the function inside IndexingContext::IndexOpts without a copy?

This revision is now accepted and ready to land.May 20 2020, 3:09 PM

Correct me if I'm wrong, but this might lead to a file never being updated in the index, even after edit, right?
Let's say you have file a.cpp that includes a.h. a.h has
namespace foo {

#define EXPAND(foo) whatever
#include "b.h"

}
all nodes from b.h will be inside the namespace node in a.h. If a.h did not change, b.h will never be indexed.

It's a corner case, of course, so may the speed gain is worth the cost, but don't we have inline #includes like this even in clang codebase?

sammccall marked 5 inline comments as done.May 25 2020, 11:22 AM

Correct me if I'm wrong, but this might lead to a file never being updated in the index, even after edit, right?
Let's say you have file a.cpp that includes a.h. a.h has
namespace foo {

#define EXPAND(foo) whatever
#include "b.h"

}
all nodes from b.h will be inside the namespace node in a.h. If a.h did not change, b.h will never be indexed.

Yes, there's some potential regression here. It exists if:

  • the including file is unmodified, the included is modified
  • the including file defines the top-level decl (e.g. enclosing namespaces), otherwise change has no effect
  • the symbols are considered part of the included file for sharding purposes, otherwise this already didn't work. (I think they are in most cases)

I think I'm comfortable throwing this in the "clangd is designed for well-structured codebases" bucket, along with non-self-contained files in other contexts (b.h is necessarily not self-contained here). I'm also comfortable saying tablegen isn't a well-structured part of LLVM :-)
And for preamble index, (which doesn't yet have this optimization) I think that's the end of the story.
However, this being persistently sticky *forever* with the background index does seem pretty horrible. I can't see an elegant fix though, and a 33% speedup for background index is an important win (it's a top clangd complaint).

Any bright ideas? If not, maybe we land this and try to fix it somehow later, or just accept it as one of our speed/accuracy tradeoffs.

clang-tools-extra/clangd/unittests/IndexActionTests.cpp
22

Yes, it was.
I feel a bit silly as I'm well aware of the behaviour here and have argued it's the right thing, and wrote testing::EndsWith inline, extracted it, and didn't check the output (as it was offscreen).

@adamcz FYI
If we see this a lot maybe we should consider something like "if all existing usings in the block are globally qualified...". Current behavior is definitely defensible though.

clang/lib/Index/IndexingAction.cpp
142

are we happy with the copy here ?

Note that we already have a copy: we take IndexingOptions by const ref and createIndexingASTConsumer copies it, including the function. So anything that relies on the identity of the function passed in, or breaks semantics of copying, is busted already.

as for internal state (e.g. a cache)... I'd agree if this were a heavily used library function (find a way to capture by pointer/shared_ptr). But it's not, and neither of the in-tree callers work this way - I'm not sure it's worth spending API complexity on.

Maybe rather progate...

Hmm, I like the idea that these are clearly separate options at the lower internal layer, and that blurs the lines a bit.
(I'd like *more* making them completely the same, but halfway isn't a happy place to be I think)

sammccall marked an inline comment as done.

Tweak names, style, comments

adamcz marked an inline comment as done.May 25 2020, 12:57 PM

I have no objections to landing this as-is, even if it represents a regression for some evil codebases.

A possible solution is to iterate through all the includes in a given file (SourceManager knows that already) and if any of them are outside of preamble, do not use this optimization on that file. Should be easy and, hopefully, fast. Most files are well-behaved and have includes in preamble only, thus the optimization would still kick in most of the time.

clang-tools-extra/clangd/unittests/IndexActionTests.cpp
22

Agreed. I'm starting to think we should have some sort of config file that defines style for the codebase (beyond clang-format config, things like always-use-::-for-using or never-add-using-for-::std) and possibly guess the values, when missing, during background indexing. It could simplify code edits a lot.

kadircet marked an inline comment as done.May 25 2020, 9:55 PM
kadircet added inline comments.
clang/lib/Index/IndexingAction.cpp
142

as for internal state (e.g. a cache)... I'd agree if this were a heavily used library function (find a way to capture by pointer/shared_ptr). But it's not, and neither of the in-tree callers work this way - I'm not sure it's worth spending API complexity on.

SG. I wasn't concerned about the copying but rather lambda invoked in ShouldSkipFunctionBody and ShouldTraverseDecl being different. I agree that preserving internal state is not that important for current in-tree uses, just wanted to make sure the discrepancy didn't go unnoticed :D

This revision was automatically updated to reflect the committed changes.