[Tooling] JSONCompilationDatabasePlugin infers compile commands for missing files
ClosedPublic

Authored by sammccall on Sep 6 2018, 5:35 AM.

Details

Summary

See the existing InterpolatingCompilationDatabase for details on how this works.
We've been using this in clangd for a while, the heuristics seem to work well.

Diff Detail

Repository
rC Clang
sammccall created this revision.Sep 6 2018, 5:35 AM
bkramer accepted this revision.Sep 12 2018, 10:49 AM

I like it

This revision is now accepted and ready to land.Sep 12 2018, 10:49 AM
This revision was automatically updated to reflect the committed changes.
mgorny added a subscriber: mgorny.Oct 11 2018, 11:38 PM

Unless my bisect is mistaken, this broke Clang Python binding test:
https://github.com/llvm-mirror/clang/blob/master/bindings/python/tests/cindex/test_cdb.py#L34

The test apparently assumes that compilation database will not return anything for a file that does not exist. Could you please suggest how the test should be adjusted for this change?

Sorry about that. I wasn't familiar with the python bindings.
Your bisect is correct, we changed the behavior that test is testing: now
an approximate match will be returned.

I guess either:

  • just remove that test (depending on how we feel about smoke-testing vs

exhaustively testing wrapperl like this)

  • change it to verify exactly what the fuzzy-matched result is, e.g.
    • add -DFEATURE=0 to the first compile command for "Project.cpp" in the

test CDB so it's easy to recognize

  • make the lookup key "/home/john.doe/MyProject/Project.h" so it's

obvious what the match should be

  • verify that the lookup result has "-DFEATURE=0" and "Project.h" in

its args
It seems to me that test_lookup_succeed should also verify what gets
returned (otherwise it's not really testing anything at this point).

I can try to put together a patch, though I don't currently have a setup to
test it - let me know if this would be useful!

Before this patch, missing compilation database entries resulted in "Skipping .... Compile command not found." which is assumed by the tests in this clang-query patch: https://reviews.llvm.org/D54109

After this patch, a command is inferred, but this is not always desired.
Use case: find all function calls with a certain name using the compilation database produced by CMake.
Example command:

clang-query -p=/tmp/wsbuild -c 'set output print' -c 'm callExpr(callee(functionDecl(hasName("uat_new"))))' $(grep -rl uat_new epan/dissectors/)

Expected result for some template files which do not exist in the compilation database:

Skipping .../epan/dissectors/asn1/x509af/packet-x509af-template.c. Compile command not found.

Actual result (clang-query tries to parse the contents which will fail since it is not an actual C source file):

.../epan/dissectors/asn1/x509af/packet-x509af-template.c:18:10: fatal error: 'packet-ber.h' file not found

I'm not entirely sure what to do here. The old behavior works great in cases where a complete database is available (produced by CMake). The new behavior might work better for clangd (?), but it breaks a use case (see above).

Before this patch, missing compilation database entries resulted in "Skipping .... Compile command not found." which is assumed by the tests in this clang-query patch: https://reviews.llvm.org/D54109

After this patch, a command is inferred, but this is not always desired.
Use case: find all function calls with a certain name using the compilation database produced by CMake.
Example command:

clang-query -p=/tmp/wsbuild -c 'set output print' -c 'm callExpr(callee(functionDecl(hasName("uat_new"))))' $(grep -rl uat_new epan/dissectors/)

Expected result for some template files which do not exist in the compilation database:

Skipping .../epan/dissectors/asn1/x509af/packet-x509af-template.c. Compile command not found.

Actual result (clang-query tries to parse the contents which will fail since it is not an actual C source file):

.../epan/dissectors/asn1/x509af/packet-x509af-template.c:18:10: fatal error: 'packet-ber.h' file not found

I'm not entirely sure what to do here. The old behavior works great in cases where a complete database is available (produced by CMake). The new behavior might work better for clangd (?), but it breaks a use case (see above).

For clangd, but also clang-tidy and clang-query when the user *does* want to use it on files not represented in the CDB. (e.g. stale or headers)
There's indeed a tension here, because the CDB discovery needs to have a default configuration.

That said, in this case the behavior looks appropriate to me: you've explicitly specified files on the command line, ignoring them and returning with status 0 seems surprising.
For the case of "search over all TUs in CDB", the CDB does offer the ability to list TUs and iterate over compile commands, and ClangTool lets you run in this mode. We've discussed in the past adding a filename filter to AllTUsExecutor, which would be useful for this purpose and others. @ioeric

I'm not entirely sure what to do here. The old behavior works great in cases where a complete database is available (produced by CMake). The new behavior might work better for clangd (?), but it breaks a use case (see above).

For clangd, but also clang-tidy and clang-query when the user *does* want to use it on files not represented in the CDB. (e.g. stale or headers)
There's indeed a tension here, because the CDB discovery needs to have a default configuration.

Support for querying header files seems valuable and a good argument to keep this change.

That said, in this case the behavior looks appropriate to me: you've explicitly specified files on the command line, ignoring them and returning with status 0 seems surprising.

The test explicitly lists them, but in practice the find or grep -rl provides the file names which might contain uninteresting matches that are not present in the CDB. As a compromise, it could return with status 1 but proceed querying files that were successfully parsed.

For the case of "search over all TUs in CDB", the CDB does offer the ability to list TUs and iterate over compile commands, and ClangTool lets you run in this mode. We've discussed in the past adding a filename filter to AllTUsExecutor, which would be useful for this purpose and others. @ioeric

AllTUsToolExecutor looks useful to enable concurrency, but a filename filter would not help in my case. The reason I do a "cheap" grep first before using clang-query is because building the AST is slow and consumes a lot of memory (after ~1k .c files, 12G was in use).

Querying the CDB and filtering out entries is currently possible, but quite verbose:

grep -le 'search term' $(jq -r '.[].file' compile_commands.json) | xargs clang-query -c 'm ...'

Thank you for the feedback, my concerns with this CDB patch has been resolved.