I[18:37:50.083] BackgroundIndex: build symbol index periodically every 86400000 ms.
I[18:37:50.115] Enqueueing 3742 commands for indexing
I[18:37:50.280] [0/3742] Loaded shards from storage
I[18:37:51.026] [1/3742] Indexed /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Demangle/MicrosoftDemangleNodes.cpp (6422 symbols, 18236 refs, 161 files)
I[18:37:51.581] [2/3742] Indexed /usr/local/google/home/kadircet/repos/llvm/llvm/lib/Support/MD5.cpp (10204 symbols, 28897 refs, 232 files)
I[18:37:51.607] [3/3742] Indexed /usr/local/google/home/kadircet/repos/llvm/llvm/lib/MC/MCSchedule.cpp (11461 symbols, 32420 refs, 247 files)
Heh, where do these come from?
Everything else is vlog (or even dlog), what's the rationale of using log here?
Maybe use std::atomic instead?
Nit: maybe we should give this constant a name? Or maybe create a command line option for this?
Maybe we should add a comment about why we do this?
It's not obvious to me that we don't underflow here given we are using size_t. Maybe add an assert or some other sanity check?
+1 for atomic
ah interesting, my guess is static index without any "-I" flags to direct path shortening.
It is the period for building index data structures which helps making queries fast, as I mentioned in the comment we definitely don't need this to happen until we've indexed every file, and BackgroundIndex should in my opinion work in this mode if period was set to zero. So I don't see any point in giving it a name.
Btw, the reason we want to build index data structures in the end is to just see how long it takes.
not introduced by this change
that line is executed at most once per each increment above, if you follow the flow you can see the break below, but good point moving it next to break to better illustrate that.
It was to print output mentioned in the summary. What would you rather suggest? Other possibilities I had in mind:
I would propose to change the name of the tool.
The "background" part only makes sense in the context of running inside clangd, the index stored on disk is not "background" in any manner.
I'd go with something like build-clangd-index (the best option is probably clangd-indexer, but it's is already taken and we don't want to confuse the two tools).
Also not sure whether the BackgroundIndex is the right abstraction to use here. Conceptually, we want everything from the background index, except the "background" part and the fact that it listens to updates for compile commands (more stuff might get inconsistent in the future, e.g if we start watching for changes to headers, we would probably want to disable this functionality here). I.e. we went with BackgroundIndex for code reuse, not because it's the right abstraction for the job. I do think we could come up with something sensible.
That said, the right abstraction is not there now and it may take considerable refactoring to come up with one. And the tool seems to be useful regardless of the underlying implementation, not exactly sure what's the balance we should strike here (time spent vs useful results).
Kadir, what are your thoughts on that matter?
That's not the accurate name or description, right?
Maybe rename and document accordingly?
Why? Maybe add a comment?
I think that's actually a hard requirement.
Maybe use std::numeric_limits<size_t>::max() here?
We seem to be fighting with the interface we defined here.
We can't leave the function called like this, since we now seem to have a legitimate usage outside the test code.
Maybe rename it accordingly?
This change is not related to the rest of the patch.
Looking at the current state of BackgroundIndex, it has the following implementation details:
- Loading of shards from storage
- Storing of shards to storage
- Collecting symbols from a TU
- Sharding of symbol information collected from one translation unit
- Performing all these tasks in a thread pool that can change scheduling priority of the thread depending on the task running.
Our requirements from the tool that currently is not possible through BackgroundIndex:
- Running tasks without lowering scheduling priority
- Blocking until all of the indexing actions finishes
I believe we can achive the first one by adding some options. The second one is already possible but it is using a test method we implemented for tests, as you mentioned we can just change the name.
Of course the above mentioned solutions would imply exposing that functionality to every user of BackgroundIndex.
I feel like it is better than exposing rest of the implementation details I mentioned in the first section(and also would be a lot faster to implement). WDYT?
I should've rather performed this through enqueue, will change to use that.
To reiterate the offline discussion: the tool seems useful to me, but it would be best to keep the client code simpler, it's currently fighting with BackgroundIndex because it's trying to hijack some of its functionality.
More specifically, I propose to add a function to BackgroundIndex.h that would be a one-shot update of the index, the client code would become significantly simpler and we would have more flexibility in how we move code around in BackgroundIndex.cpp to actually make this happen.
Adding @sammccall, who might also be interested in this.
Leaving a summary of the offline discussion here.
@sammccall has pointed out maintaining and shipping more tools is work and we should probably avoid adding them upstream without careful design.
We ended up deciding to not land this particular upstream just yet, but anyone is free to patch this change on top of their LLVM repo in case they need something like this for debugging purposes.
Obviously no guarantees that this won't break.