This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Watch for changes in compile_commands.json
Needs ReviewPublic

Authored by simark on Jul 12 2018, 2:12 PM.

Details

Summary

This patch adds support for watching for changes to
compile_commands.json, and reparsing files if needed.

The watching is done using the "workspace/didChangeWatchedFiles"
notification, so the actual file watching is the frontend's problem.

To keep things simple, we react to any change involving a file called
"compile_commands.json".

The chosen strategy tries to avoid useless reparses. We don't want to
reparse a file if its compile commands don't change. So when we get a
change notification, we:

  1. Save the compile commands of all open files on the side.
  2. Clear everything we know about compilation databases and compilation commands.
  3. Query the compile commands again for all open files (which will go read the possibly updated compile commands). If the commands are different than the saved ones, we reparse the file.

I updated the vscode-clangd extension. If you don't feel inspired, you
can use this small .cpp to test it:

#ifdef MACRO
#warning "MACRO is defined"
#else
#warning "MACRO is not defined"
#endif

int main() {}

and this compile_commands.json:

[{
  "directory": ".",
  "file": "test.cpp",
  "arguments": ["g++", "-c", "test.cpp", "-DMACRO=2"]
}]

Adding and removing the "-DMACRO=2" argument, you should see a different

Event Timeline

simark created this revision.Jul 12 2018, 2:12 PM

Note, D49265 in clang is a prerequisite.

simark updated this revision to Diff 155284.Jul 12 2018, 3:21 PM

Remove unintended changes

Thanks for putting up this change! It can be really annoying that clangd does not pick up the compile commands that got updated.

A few things of the top of my head on why we might want to punt on using the LSP watches:

  • File watching may not be implemented in the language server at all. That's certainly the case for our hacked-up ycmd<->lsp bridge.
  • At some point we'll be interested in changes to the headers our sources have included. This would involve a lot of watches for a dynamically changing set of files. There's a good chance we'll have to consider using native file watch APIs instead of the LSP methods to handle those (e.g., for performance reasons).

We've been talking about starting to implement a better buildsystem integration lately and taking compile_commands.json changes into account is certainly on the table. We'll probably have some concrete proposals in August/September.

In the meanwhile, maybe removing the caching of compile commands could fix the problem? I don't have any data on how important the caching is to performance (I would assume it to not be very important, since we only request compile commands on the file changes; completion, findDefinitions, etc reuse the last compile command anyway).
D48071 added an option to disable the caching, it never landed because we didn't agree on the default, but we can make it happen.
It looks like a simpler workaround that works on all clients and we can figure out how to properly integrate file watching later. WDYT?

@sammccall pointed out that I've been looking at a different layer of caching.
Clangd does per-directory (to avoid reloading compilation database multiple times) and per-file (to avoid calling into compilation database multiple times, it's expensive for our internal CDB) caching of compile commands.
D48071 allows to get rid of per-file cache, but still keeps the per-directory cache, so this patch alone won't fix the problem.

We have a somewhat simple fix in mind: for interactive tools, like clangd, the JSONCompilationDatabase could be configured to check for changes to compile_commands.json and rebuild the compile commands on each call to its methods, if necessary. E.g. we can still keep the per-directory "caching" of CompilationDatabases inside clangd, but CompilationDatabase instances will start returning new compile commands when compile_commands.json is updated.
This would mean more FS accesses (and, potentially, json parsing) on getCompileCommands and the same instance of CompilationDatabase will now be potentially providing different compile commands for the same file on different getCompileCommands calls.
However,

  • The FS access concern shouldn't be noticeable for the clang tools: parsing/preprocessing of files is usually done after the getCompileCommands call, so the overhead of files access to compile_commands.json and json reads should be negligible, compared to the work to process C/C++ files.
  • The fact that const object now changes seems to be a potential source of confusion for the tools and (maybe?) some of them are not prepared for this, so we should probably allow turning this behavior on and off. clangd (and other interactive tools that might be interested in this), would turn it on, while it will be off by default.

It seems this could be implemented somewhere around the JSONCompilationDatabasePlugin pretty easily. WDYT?
I'm also happy to come up with a patch to JSONCompilationDatabasePlugin, at this point I'm pretty familiar with it. Unless you think this approach is flawed or want to do it yourself, of course. In any case, please let us know what's your opinion on this.

Thanks for putting up this change! It can be really annoying that clangd does not pick up the compile commands that got updated.

A few things of the top of my head on why we might want to punt on using the LSP watches:

  • File watching may not be implemented in the language server at all. That's certainly the case for our hacked-up ycmd<->lsp bridge.

I guess you mean language client here, not language server. In our case we already have a client capable of file watching, so it's convenient for us to do that (plus, this is what the LSP specification recommends). If you want to make clangd capable of watching files natively, I am not against it, but it's not on our critical path. As long as the file is watched, I'm happy.

Our client does not yet support file watch registration, that's why I did not implement it, but once we upgrade to a client that does, the plan is to make clangd ask the client to watch **/compile_commands.json. Right now, we hardcoded in our client that we want to watch any file called compile_commands.json:

https://github.com/theia-ide/theia/pull/2405/commits/fe48e8d63f98edd8792522381111844a58f2cfa1

  • At some point we'll be interested in changes to the headers our sources have included. This would involve a lot of watches for a dynamically changing set of files. There's a good chance we'll have to consider using native file watch APIs instead of the LSP methods to handle those (e.g., for performance reasons).

What would be the performance bottleneck? There would be a lot of watched files fore sure, but very little change events.

We've been talking about starting to implement a better buildsystem integration lately and taking compile_commands.json changes into account is certainly on the table. We'll probably have some concrete proposals in August/September.

In the meanwhile, maybe removing the caching of compile commands could fix the problem? I don't have any data on how important the caching is to performance (I would assume it to not be very important, since we only request compile commands on the file changes; completion, findDefinitions, etc reuse the last compile command anyway).
D48071 added an option to disable the caching, it never landed because we didn't agree on the default, but we can make it happen.
It looks like a simpler workaround that works on all clients and we can figure out how to properly integrate file watching later. WDYT?

... see answer below, as you have already provided an updated to this ...

@sammccall pointed out that I've been looking at a different layer of caching.
Clangd does per-directory (to avoid reloading compilation database multiple times) and per-file (to avoid calling into compilation database multiple times, it's expensive for our internal CDB) caching of compile commands.
D48071 allows to get rid of per-file cache, but still keeps the per-directory cache, so this patch alone won't fix the problem.

Indeed.

We have a somewhat simple fix in mind: for interactive tools, like clangd, the JSONCompilationDatabase could be configured to check for changes to compile_commands.json and rebuild the compile commands on each call to its methods, if necessary. E.g. we can still keep the per-directory "caching" of CompilationDatabases inside clangd, but CompilationDatabase instances will start returning new compile commands when compile_commands.json is updated.
This would mean more FS accesses (and, potentially, json parsing) on getCompileCommands and the same instance of CompilationDatabase will now be potentially providing different compile commands for the same file on different getCompileCommands calls.
However,

  • The FS access concern shouldn't be noticeable for the clang tools: parsing/preprocessing of files is usually done after the getCompileCommands call, so the overhead of files access to compile_commands.json and json reads should be negligible, compared to the work to process C/C++ files.
  • The fact that const object now changes seems to be a potential source of confusion for the tools and (maybe?) some of them are not prepared for this, so we should probably allow turning this behavior on and off. clangd (and other interactive tools that might be interested in this), would turn it on, while it will be off by default.

It seems this could be implemented somewhere around the JSONCompilationDatabasePlugin pretty easily. WDYT?
I'm also happy to come up with a patch to JSONCompilationDatabasePlugin, at this point I'm pretty familiar with it. Unless you think this approach is flawed or want to do it yourself, of course. In any case, please let us know what's your opinion on this.

The important point of this patch is that when some change happens to some compile_commands.json, we detect which open file has changed compile flags and therefore needs re-parse. It's not clear to me how this is achieved with your proposal.

I guess you mean language client here, not language server. In our case we already have a client capable of file watching, so it's convenient for us to do that (plus, this is what the LSP specification recommends). If you want to make clangd capable of watching files natively, I am not against it, but it's not on our critical path. As long as the file is watched, I'm happy.

Yes, language client, thanks!

  • At some point we'll be interested in changes to the headers our sources have included. This would involve a lot of watches for a dynamically changing set of files. There's a good chance we'll have to consider using native file watch APIs instead of the LSP methods to handle those (e.g., for performance reasons).

What would be the performance bottleneck? There would be a lot of watched files fore sure, but very little change events.

Right, the number of watched files be large. I bet some implementations will have trouble handling the file watching properly. But that's just a guess, we definitely need some data there. File watching seems like a complicated problem and it may happen to be core enough for clangd performance, so I wouldn't bet on every language client doing it in a reliable, performant way if we can solve it reliably in clangd instead. At least we should have a fallback implementation in case the clients don't support it.
But none of that is set in stone, just thinking out loud.

We've been talking about starting to implement a better buildsystem integration lately and taking compile_commands.json changes into account is certainly on the table. We'll probably have some concrete proposals in August/September.

In the meanwhile, maybe removing the caching of compile commands could fix the problem? I don't have any data on how important the caching is to performance (I would assume it to not be very important, since we only request compile commands on the file changes; completion, findDefinitions, etc reuse the last compile command anyway).
D48071 added an option to disable the caching, it never landed because we didn't agree on the default, but we can make it happen.
It looks like a simpler workaround that works on all clients and we can figure out how to properly integrate file watching later. WDYT?

... see answer below, as you have already provided an updated to this ...

@sammccall pointed out that I've been looking at a different layer of caching.
Clangd does per-directory (to avoid reloading compilation database multiple times) and per-file (to avoid calling into compilation database multiple times, it's expensive for our internal CDB) caching of compile commands.
D48071 allows to get rid of per-file cache, but still keeps the per-directory cache, so this patch alone won't fix the problem.

The important point of this patch is that when some change happens to some compile_commands.json, we detect which open file has changed compile flags and therefore needs re-parse. It's not clear to me how this is achieved with your proposal.

You're right, it won't, the file will only get reparsed with the new compile command whenever user modifies it.
The approach is not ideal, but may be a good middle ground before we figure out how we approach file watching in clangd. Note that there are other things that won't force the updates currently, e.g. changes to the included headers won't cause rebuilds of the source files until user modifies them. And those are much more frequent than changes to compile_commands.json, so they seem like a more pressing problem.

The approach is not ideal, but may be a good middle ground before we figure out how we approach file watching in clangd. Note that there are other things that won't force the updates currently, e.g. changes to the included headers won't cause rebuilds of the source files until user modifies them. And those are much more frequent than changes to compile_commands.json, so they seem like a more pressing problem.

Ok, I agree that having clangd watch files itself could be necessary at some point (when the client does not support it), but it would have to be configurable. In our case, we have efficient enough file watching in the client, and already watch the files in the workspace. Since the inotify watches are limited per-user, having clangd watch them too means we'll have duplicates, and therefore waste a rather limited resource.

Actually, clangd could simply use the client capabilities. If the client advertises support for file watching with dynamic registration, use that, otherwise use the internal mechanism.

In the mean time, I don't think the current patch paints us in a corner. The logic of checking whether the effective compile commands for open files changes would stay even if the file watching mechanism changes.

Ok, I agree that having clangd watch files itself could be necessary at some point (when the client does not support it), but it would have to be configurable. In our case, we have efficient enough file watching in the client, and already watch the files in the workspace. Since the inotify watches are limited per-user, having clangd watch them too means we'll have duplicates, and therefore waste a rather limited resource.

Actually, clangd could simply use the client capabilities. If the client advertises support for file watching with dynamic registration, use that, otherwise use the internal mechanism.

Yeah, sounds reasonable. The limits on the number of watches is definitely something that could bite the editor+clangd bundle, so that's a good reason to use the client watching if that's available.
We want to share some design around this area in Aug-Sep.

In the mean time, I don't think the current patch paints us in a corner. The logic of checking whether the effective compile commands for open files changes would stay even if the file watching mechanism changes.

Agreed. But let's try to keep the change minimal, specifically we could get away without replacing 'reparseOpenedFiles', see the inline comment about it.

clangd/ClangdLSPServer.cpp
430

Maybe keep the old logic of reparsing all open files? This would make the change way simpler and I don't think we need this extra complexity in the long run, when we have better integration with the build system.

ClangdServer will reuse the preamble if compile command didn't change anyway, so reparse will be very fast and shouldn't be affected.
If the compile command does change, we'll retrigger the full rebuild.

clangd/clients/clangd-vscode/src/extension.ts
35

These watches apply to the children of the workspace root only, right?
E.g., I quite often open the workspace in 'llvm/tools/clang/tools/extra/clangd', would my compile_commands.json, that's outside the workspace, be watched for changes?

malaperle added inline comments.Jul 24 2018, 7:47 AM
clangd/ClangdLSPServer.cpp
430

I think the change is not that complex but brings much added value. About the integration with the build system, there are many build systems out there so I don't think better integration will be useful in many scenarios (plain make, custom builds, etc). This solution is generic enough so that any build system that generates compile_commands.json will be supported in a pretty good way.

simark added inline comments.Jul 24 2018, 7:54 AM
clangd/ClangdLSPServer.cpp
430

@malaperle also suggested an alternative way of doing it. Instead of saving the the compile commands in a custom structure before clearing the cache, we could save the compile flags in the ParsedAST object. When some compile_commands.json changes, we can compare the new compile flags with this saved copy. I think it would make the code a bit more straightforward.

clangd/clients/clangd-vscode/src/extension.ts
35

You are right, I don't think files outside the workspace would get watched.

For your use case, we'll need clangd to watch (or ask the client to watch) the file.

Just a couple of high-level comments here:

  • I'm not sure we can/should commit to supporting editor-based file watching forever.
    • One natural long-term direction would be to get this functionality into JSONCompilationDatabase, and clients of that don't have an LSP client to provide events
    • client support is going to remain uneven in the foreseeable future. LSP is as always underspecified here, and there's no chance that every editor plugin is going to do a reasonable job of recursive file watching, even among those that advertise support for it.
    • if/when we have a native implementation, supporting multiple mechanisms with different layering requirements to get at most a 2x win in watcher resource usage seems like a dubious way to spend our complexity budget.
  • it looks like the current design throws away all compilation databases (via the global DB) when *any* compile_commands.json changes. I think it's important that other stateful compilation databases aren't thrown away for unrelated reasons. e.g. the Bazel DB takes nontrivial time to reinitialize.

if/when we have a native implementation, supporting multiple mechanisms with different layering requirements to get at most a 2x win in watcher resource usage seems like a dubious way to spend our complexity budget.

I guess the only think that might force us to do this is the low system limit on the number of file watches. I do remember rumors that it can be a problem in, but I haven't really seen those on my own, so maybe I'm just pretending.
But I do agree that we shouldn't commit to have yet-another implementation in the long-term, unless we have good justification for it.

it looks like the current design throws away all compilation databases (via the global DB) when *any* compile_commands.json changes. I think it's important that other stateful compilation databases aren't thrown away for unrelated reasons. e.g. the Bazel DB takes nontrivial time to reinitialize.

Good point, we should fix this. Started a thread about it in the inline comments.

clangd/ClangdLSPServer.cpp
430

I think the change is not that complex but brings much added value.
@malaperle also suggested an alternative way of doing it. Instead of saving the the compile commands in a custom structure before clearing the cache, we could save the compile flags in the ParsedAST object. When some compile_commands.json changes, we can compare the new compile flags with this saved copy. I think it would make the code a bit more straightforward.

The no-op rebuilds in case nothing changes is a good and long overdue optimization, and I agree that ParsedAST or TUScheduler::update is probably the right place to put it into. Any other benefits we get from snapshotting the compile commands here? Could we make this optimization in a separate change? It does not seem directly related to the file watching changes. Also happy to look at it, finding the right place to do it might involve digging through layers of classes that manage the AST.

About the integration with the build system, there are many build systems out there so I don't think better integration will be useful in many scenarios (plain make, custom builds, etc). This solution is generic enough so that any build system that generates compile_commands.json will be supported in a pretty good way.

Just to clarify, the "better buildsystem integration" I refer to is not about enabling support for more build systems (though, that would be nice-to-have), it's about building abstractions that better support the current use-cases, i.e. making the connection between the CompiationDatabase and compiler_commands.json more explicit, allowing the track changes in the build system, etc. We think that file-watching will definitely be part of it, but we don't have full design yet.

In any case, we do think that there are improvements to be made both in compile_commands.json and in our internal Bazel integration.

clangd/GlobalCompilationDatabase.cpp
71

Could we only clear the databases for the folder under which the changed compile_commands.json lives?

clangd/clients/clangd-vscode/src/extension.ts
35

Got it. That would be nice to have, but will have to wait for dynamic registration of file watchers...

clangd/ClangdLSPServer.cpp
430

D49783 makes the rebuild noop if compile command and preamble deps don't change, I think this should be good enough to keep rebuildOpenedFiles call for now.

Herald added a project: Restricted Project. · View Herald TranscriptJul 30 2019, 11:28 AM
Herald added a subscriber: kadircet. · View Herald Transcript