Page MenuHomePhabricator

[clangd] Introduce background-indexer

Authored by kadircet on Mar 20 2019, 10:39 AM.



Sample output:
I[18:37:50.083] BackgroundIndex: build symbol index periodically every 86400000 ms.
I[18:37:50.115] Enqueueing 3742 commands for indexing
I[18:37:50.280] [0/3742] Loaded shards from storage
I[18:37:51.026] [1/3742] Indexed /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Demangle/MicrosoftDemangleNodes.cpp (6422 symbols, 18236 refs, 161 files)
I[18:37:51.581] [2/3742] Indexed /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/MD5.cpp (10204 symbols, 28897 refs, 232 files)
I[18:37:51.607] [3/3742] Indexed /usr/local/google/home/kadircet/repos/llvm/llvm/lib/MC/MCSchedule.cpp (11461 symbols, 32420 refs, 247 files)

Event Timeline

kadircet created this revision.Mar 20 2019, 10:39 AM
Eugene.Zelenko added inline comments.

Please use relative path. Same below.

ilya-biryukov added inline comments.Mar 20 2019, 10:52 AM

Heh, where do these come from?
Does our include insertion prefer to add global paths in some cases? Dynamic index?


Everything else is vlog (or even dlog), what's the rationale of using log here?


Maybe use std::atomic instead?


Unnecessary empty line.

jkorous added inline comments.Mar 20 2019, 11:04 AM

Nit: maybe we should give this constant a name? Or maybe create a command line option for this?


Maybe we should add a comment about why we do this?


It's not obvious to me that we don't underflow here given we are using size_t. Maybe add an assert or some other sanity check?


+1 for atomic

kadircet updated this revision to Diff 191684.Mar 21 2019, 7:17 AM
kadircet marked 11 inline comments as done.
  • Address comments
kadircet added inline comments.Mar 21 2019, 7:17 AM

ah interesting, my guess is static index without any "-I" flags to direct path shortening.


It is the period for building index data structures which helps making queries fast, as I mentioned in the comment we definitely don't need this to happen until we've indexed every file, and BackgroundIndex should in my opinion work in this mode if period was set to zero. So I don't see any point in giving it a name.

Btw, the reason we want to build index data structures in the end is to just see how long it takes.


not introduced by this change


that line is executed at most once per each increment above, if you follow the flow you can see the break below, but good point moving it next to break to better illustrate that.


It was to print output mentioned in the summary. What would you rather suggest? Other possibilities I had in mind:

  • Poll from client for current status
  • Pass an option to background index to dump these logs conditionally

I would propose to change the name of the tool.
The "background" part only makes sense in the context of running inside clangd, the index stored on disk is not "background" in any manner.
I'd go with something like build-clangd-index (the best option is probably clangd-indexer, but it's is already taken and we don't want to confuse the two tools).

Also not sure whether the BackgroundIndex is the right abstraction to use here. Conceptually, we want everything from the background index, except the "background" part and the fact that it listens to updates for compile commands (more stuff might get inconsistent in the future, e.g if we start watching for changes to headers, we would probably want to disable this functionality here). I.e. we went with BackgroundIndex for code reuse, not because it's the right abstraction for the job. I do think we could come up with something sensible.

That said, the right abstraction is not there now and it may take considerable refactoring to come up with one. And the tool seems to be useful regardless of the underlying implementation, not exactly sure what's the balance we should strike here (time spent vs useful results).

Kadir, what are your thoughts on that matter?


That's not the accurate name or description, right?
We would actually walk up the directory tree as normal and discover any compilation database up the tree, right?

Maybe rename and document accordingly?


Why? Maybe add a comment?


I think that's actually a hard requirement.
We definitely want to run with normal priority when running as a separate tool.


Maybe use std::numeric_limits<size_t>::max() here?
Would mean we'll never build the dex index, exactly what we want here.


We seem to be fighting with the interface we defined here.
If we need to support an operation of re-building the index for the underlying CDB, let's add it explicitly.


We can't leave the function called like this, since we now seem to have a legitimate usage outside the test code.

Maybe rename it accordingly?


This change is not related to the rest of the patch.
Even adding some stats makes sense, could you move into a separate change to keep this change minimal?

kadircet marked an inline comment as done.Apr 11 2019, 9:03 AM

Looking at the current state of BackgroundIndex, it has the following implementation details:

  • Loading of shards from storage
  • Storing of shards to storage
  • Collecting symbols from a TU
  • Sharding of symbol information collected from one translation unit
  • Performing all these tasks in a thread pool that can change scheduling priority of the thread depending on the task running.

Our requirements from the tool that currently is not possible through BackgroundIndex:

  • Running tasks without lowering scheduling priority
  • Blocking until all of the indexing actions finishes

I believe we can achive the first one by adding some options. The second one is already possible but it is using a test method we implemented for tests, as you mentioned we can just change the name.
Of course the above mentioned solutions would imply exposing that functionality to every user of BackgroundIndex.

I feel like it is better than exposing rest of the implementation details I mentioned in the first section(and also would be a lot faster to implement). WDYT?


I should've rather performed this through enqueue, will change to use that.

To reiterate the offline discussion: the tool seems useful to me, but it would be best to keep the client code simpler, it's currently fighting with BackgroundIndex because it's trying to hijack some of its functionality.
More specifically, I propose to add a function to BackgroundIndex.h that would be a one-shot update of the index, the client code would become significantly simpler and we would have more flexibility in how we move code around in BackgroundIndex.cpp to actually make this happen.

Adding @sammccall, who might also be interested in this.

To make it clear, I think the question is not just "which part of functionality is missing in BackgroundIndex", it's rather "which part of BackgroundIndex we don't need".

Leaving a summary of the offline discussion here.

@sammccall has pointed out maintaining and shipping more tools is work and we should probably avoid adding them upstream without careful design.
We ended up deciding to not land this particular upstream just yet, but anyone is free to patch this change on top of their LLVM repo in case they need something like this for debugging purposes.

Obviously no guarantees that this won't break.

kadircet abandoned this revision.Apr 24 2019, 2:27 AM