Page MenuHomePhabricator

[clangd] Implemented indexing of standard library

Authored by kuhnel on Jun 30 2021, 1:00 AM.



This is only the indexing part, it is NOT wired up to the
rest of ClandgdServer.

This is a step towards an implementation for

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
sammccall added inline comments.Aug 5 2021, 5:15 AM

this path should probably use native conventions: "/stdlibheaders.cpp" isn't a good filename on windows.

Since it doesn't really need to exist, maybe #ifdef _WIN32 like we do for testRoot in unittests/TestFS would be nice. Such a virtualRoot() could go in FS.h if you like!

Would be nice to have "virtual" in the path so it's clear in e.g. error messages that it's not a real path.


This isn't actually threadsafe though?
Calls to viewImpl() are supposed to return filesystems with independent state (workdir).

Maybe it doesn't actually bite us here but I don't see much cost to actually building the FS on demand. (The threadsafeFS should own the string and use a non-copying buffer)


this contains only the umbrella headers, and doesn't seem to be overlaid on the real filesystem. So how does this work?


no need for the -o either


CreateMemBuffer instead to avoid expressing this impossible error condition?


hang on, if we're providing the overridden buffer to prepareCompilerInstance (like we do for edited files in clangd that may or may not be saved on disk yet) then why do we need the VFS at all?


I don't think storing refs originating in the standard library is profitable overall.
Functionally I can think of cases it makes better & worse, but it's not key to our core use cases and it's the majority of index size IIRC.


similarly we don't need relations for this index i believe


we don't actually use the include graph afaics


This comment is copied from background index, we don't need it everywhere and it's less relevant here (we're not really compiling arbitrary broken code)


This never gets used, why are we setting it?


This should mention the standard library


we could filter Symbols to only include those in our list, rather than all the private implementation cruft. (I expect cruft is the majority by weight)
Not in this patch, but maybe a fixme?


this tracer has the wrong lifetime and won't measure the actual indexing time, hoist it to the top?


nit: librarx->library


this should probably be Dex instead of MemIndex unless it's tiny


This is where an overview of the feature would go :-)


the interface here will probably want to grow to include some sort of enum/config struct for the language/version/whatever we decide to support (at minimum I'd think C vs C++). Fine to just hardcode c++ for now, maybe leave a comment.


nit: llvm/clangd don't generally use this auto-everywhere style, rather StandardLibraryIndex Sli.
And this would be SLI rather than Sli.


I'm not sure we're testing what we want to be testing here.

In real life, the symbols are not going to be in the entrypoint, but a file included from it.
And clangd's indexer *does* treat the main file differently from others.

It's pretty awkward to actually fix but maybe worth a comment.


not sure why we're setting a filter here - we don't need to test that fuzzyfind works


AnyScope and Limit are not needed

#include "TestIndex.h"
MATCHER_P(Named, N, "") { return arg.Name == N; }
EXPECT_THAT(match(*Index, FuzzyFindRequest{}), ElementsAre(Named("myfunc"), Named("otherfunc")));

EXPECT_THAT/ElementsAre give much better error messages (e.g. if there are extra elements, it'll tell you what they are)
If you don't care if there are extra elements, just drop the assertion on size

kuhnel updated this revision to Diff 365181.Aug 9 2021, 7:35 AM
kuhnel marked 29 inline comments as done.
kuhnel edited the summary of this revision. (Show Details)

addressed code review comments

kuhnel added a comment.Aug 9 2021, 7:36 AM

Main points in the implementation are:

  • simplify the exposed interface

Good point, I added a new function indexStandardLibrary() as external interface.

  • i don't think we need to mess with VFS at all actually

Yes, removed that.

  • we should think a little about which index data we actually want to keep/use

I removed Refs, Relations and Graph as per your comment. However I have to admit, I don't know what they are and how they are used.
What's a good place to look at so that I learn what they do?

Next design questions seem to be about lifetime/triggering:

  • how many configurations of stdlib index to have
  • when do we build the indexes, and who owns them
  • what logic governs whether/which stdlib index is triggered, and where do we put it

While you're out, I'll try to set up something so that I can do some manual tests with a real standard library.


Not sure what you mean with CreateMemBuffer.
I replaced the if with an assert as this should not fail.


We don't need the VFS at all, you're right. Handing over a buffer is good enough.


So you would prefer that we change HeaderMock to only contain #include and then create the mock headers as virtual files in the MockFS?


AnyScope is actually needed, otherwise the result is empty.

kuhnel updated this revision to Diff 365370.Aug 10 2021, 12:28 AM

tried to fix Windows build failure

kuhnel updated this revision to Diff 365415.Aug 10 2021, 3:50 AM

fixed a couple of bugs

  • wrong usage of llvm::unique
  • wrong usage of static pointer
sammccall added inline comments.Aug 17 2021, 6:15 AM

"node" doesn't mean anything here.


this should be defined out-of-line unless it's performance-critical for some reason.

Conditional compilation in inline bodies is a magnet for ODR violations. _WIN32 is probably fine but no reason to scare the reader :-)


This is a (drive-)relative path, we have various places we need absolute paths and may want to reuse this there. Does C:\virtual work?


as mentioned this should ideally include some clue that it's a virtual path




don't use .hpp and rely on the driver picking the right language & version, this won't generalize. Instead, insert the flags "-xc++-header" and "-std=c++14" based on the StandardLibraryVersion


the static variable must be a string* rather than string to avoid global destructors.


I'm not sure what this means, I don't think there's anything better to do here.


why false? the default (true) is what clang's parser needs


pass the default nullptr instead of a no-op lambda, it allows the indexer to skip work


this assertion message doesn't say anything vs the assertion itself, either drop it or say why instead


nit: enum class to avoid polluting namespace.

llvm style capitalizes variables: CXX14


nit: "variant" rather than version to avoid confusion with language version?
(since this will cover c also)


The comment should be aimed at users of this module, not implementers of it :-)
This is the main API comment...


I think I agree with your comment elsewhere that it's sufficient to return unique_ptr, indicate it might be null, and log errors.


if introducing the enum i'd remove the default, this policy is best expressed at the call site


This class doesn't need to be public and I don't think it needs to exist.

generateIncludeHeader is a pure function of StandardLibraryVersion, it can be a free function.

This leaves only indexHeaders, which can also be a free function of StandardLibraryVersion and TFS.
(VirtualUmbrellaHeaderFileName is purely transient state, and isn't used by any tests)


as mentioned, umbrella header


this is gone

nridge added inline comments.Aug 17 2021, 12:34 PM

One could imagine picking a source file from the project's CDB, and using its flags to parse the standard library.

That could be relevant for macros that affect the way standard library headers are parsed (like _GLIBCXX_DEBUG perhaps?)

I removed Refs, Relations and Graph as per your comment. However I have to admit, I don't know what they are and how they are used.
What's a good place to look at so that I learn what they do?

Sorry about missing this.
The best place is the SymbolIndex interface - Symbols corresponds to the fuzzyFind/lookup methods, refs and relations each have their own methods. You can somewhat guess from the methods/data structures what these are used for, but clangd can answer your question! e.g. find-references on refs() will show you it is used in the find-references implementation :-) As well as rename and others.

Include graph is a bit of a special case, it's basically just used for partitioning, incrementally updating, and loading the background index IIRC.


Oh, that makes sense. I'd still probably not do this, given:

  • for projects with a CDB, we'll probably bg-index most of the stdlib in that configuration soon anyway.
  • until we see evidence otherwise, my guess is differences are pretty minor. (Honestly if it were easy to just ship a prebuilt index for code completion, I would be tempted.)
  • it adds some constraints on design/layering/sequencing etc
  • it makes questions of how many configurations to build/when to reuse vs invalidate more complicated
kuhnel updated this revision to Diff 367523.Aug 19 2021, 9:13 AM
kuhnel marked 18 inline comments as done.

addressed review comments, has use-after-free problem

kuhnel added inline comments.Aug 19 2021, 9:16 AM

Yes, my question was: can we get the real compile command form the file in which we're querying the standard library index and then extract the (relevant) compiler argument from that.

That might also help in guessing the current language variant.


@sammccall I seem to be running into a use-after-free problem here. Debugging the whole thing shows that Index is pointing to an invalid address. So the problem is somewhere between returning the unique_ptr from indexUmbrellaHeaders(...) and assigning it to the Index variable.

Can you please take a look and give me a hint how to fix this?

nridge added inline comments.Aug 19 2021, 11:08 AM

I think your issue may be that Dex doesn't actually take ownership of the slabs that get passed to it; the slabs need to outlive it.

Dex has another constructor which allows it to also take ownership, and a Dex::build() helper function to call it -- you probably want to be using that.

kuhnel marked an inline comment as done.Aug 20 2021, 1:23 AM
kuhnel added inline comments.

Awesome, thx @nridge ! That fixed the use-after-free!
I was searching in the wrong place the whole time...

kuhnel updated this revision to Diff 367737.Aug 20 2021, 2:01 AM
kuhnel marked an inline comment as done.

addressed code review comments

also fixed use-after-free

sammccall accepted this revision.Aug 20 2021, 2:26 AM

Remainder is just nits, looks good!




nit: the unhandled case can't dynamically happen. This should probably be a switch, and the default: case should be llvm_unreachable("Unhandled language variant")


tandard -> Standard


missing space after colon


elog("Standard Library Index: {0}", std::move(Err));


The somehow/magically is by looking at the clang::LangOptions of the file being parsed :-)
This can happen if/when we move the triggering into TUScheduler which obtains and parses the command line flags.

Concretely I'd expect to resolve this FIXME by adding some function like Optional<StandardLibraryVariant> chooseStandardLibrary(const LangOptions&)

If this makes sense to you, you might want to make the comment a bit less hand-wavy


librarVariant ->libraryVariant


I'd say rather "this index allows completion of standard library symbols whose headers have not been included yet".

The current text implies that we'd turn this off once we see #include <vector>, thus breaking completion of e.g unordered_map. I don't think we want to.


Sorry, I think I wasn't clear: the language variant should still be a parameter here, it just shouldn't have a default value. Instead the caller should pass it explicitly, this makes it obvious at the callsite that there's an imperfect assumption being made.


nit: umbrellaHeader singular. (The umbrella is the file mapped to HeaderSources, the headers it includes are not umbrella headers for our purposes)


enums are passed by value, not const reference
(and below)


nit: capitalization

This revision is now accepted and ready to land.Aug 20 2021, 2:26 AM
kuhnel updated this revision to Diff 367787.Aug 20 2021, 7:07 AM

fixing windows build

What is the status of this -- is it ready to be merged?

What is the status of this -- is it ready to be merged?

This works as far as it goes, but it needs someone to wire it up completely: build these indexes somewhere that's less blocking than the main thread, determine the right one to attach dynamically based on the file language, etc.

The original plan was that Christian would do this as a followup but that's not likely to happen. Meanwhile many of the usual suspects are a bit backed up. We can definitely land this if you or anyone might want to finish it...