This is an archive of the discontinued LLVM Phabricator instance.

[JSONCompilationDatabase] Strip distcc/ccache/gomacc wrappers from parsed commands.
ClosedPublic

Authored by sammccall on Jul 7 2019, 8:07 AM.

Details

Summary

It's common to use compiler wrappers by setting CC="gomacc clang++".
This results in both args appearing in compile_commands.json, and clang's driver
can't handle this.

This patch attempts to recognize this pattern (by looking for well-known
wrappers) and dropping argv0 in this case.

It conservatively ignores other cases for now:

  • wrappers with unknown names
  • wrappers that accept -flags
  • wrappers where the compiler to use is implied (usually cc or gcc)

This is done at the JSONCompilationDatabase level rather than somewhere more
fundamental, as (hopefully) this isn't a general conceptual problem, but a messy
aspect of the nature of the ecosystem around compile_commands.json.
i.e. compilation databases more tightly tied to the build system should not have
this problem.

Diff Detail

Event Timeline

sammccall created this revision.Jul 7 2019, 8:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 7 2019, 8:07 AM
MaskRay added a subscriber: MaskRay.Jul 7 2019, 9:14 AM

I just wanted to say I hope there is some way to opt out this behavior. Some users (myself:) ) of JSONCompilationDatabase do not want to handle command line option filtering in this layer. I think this logic should be raised to the consumer layer.

MaskRay removed a subscriber: MaskRay.

I just wanted to say I hope there is some way to opt out this behavior. Some users (myself:) ) of JSONCompilationDatabase do not want to handle command line option filtering in this layer. I think this logic should be raised to the consumer layer.

The design of the compilation database/plugin model makes it pretty hard to add options.

Can you explain a bit more why a clang-based tool would want to see the wrapper command? AFAIK the intention of this interface has been to provide argv for the driver to consume.

Is distcc gcc.exe or gomacc gcc.exe possible?

The design of the compilation database/plugin model makes it pretty hard to add options.

Yes. It also lacks user settings. Downstream compile_commands.json consumers may provide their own command line option filtering mechanism that users can customize and add more compiler schedulers. For some tools, they probably don't want to see filtering applying on multiple layers, especially if the filtering mechanism done at the JSONCompilationDatabase layer may have false positives.

but a messy aspect of the nature of the ecosystem around compile_commands.json.

I'll appreciate it if you propose something to make cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=On ninja -t comdb etc more precise:)

Is distcc gcc.exe or gomacc gcc.exe possible?

The design of the compilation database/plugin model makes it pretty hard to add options.

Yes. It also lacks user settings. Downstream compile_commands.json consumers may provide their own command line option filtering mechanism that users can customize and add more compiler schedulers. For some tools, they probably don't want to see filtering applying on multiple layers, especially if the filtering mechanism done at the JSONCompilationDatabase layer may have false positives.

but a messy aspect of the nature of the ecosystem around compile_commands.json.

I'll appreciate it if you propose something to make cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=On ninja -t comdb etc more precise:)

We have recently implemented command_launcher property in GN which is prepended to generated command in Ninja but omitted from the compilation database generated by GN. CMake has CMAKE_{C,CXX}_COMPILER_LAUNCHER which works exactly the same way. That addresses this problem in GN and CMake. Furthermore, bear automatically strips compiler wrappers from the generated compilation database. However, there are other ways to generate compilation database, e.g. ninja -t compdb, and those suffer from the described problem so I think we still need this.

Is distcc gcc.exe or gomacc gcc.exe possible?

The design of the compilation database/plugin model makes it pretty hard to add options.

Yes. It also lacks user settings. Downstream compile_commands.json consumers may provide their own command line option filtering mechanism that users can customize and add more compiler schedulers. For some tools, they probably don't want to see filtering applying on multiple layers, especially if the filtering mechanism done at the JSONCompilationDatabase layer may have false positives.

Do you have examples of things that would break with this filtering applied?

but a messy aspect of the nature of the ecosystem around compile_commands.json.

I'll appreciate it if you propose something to make cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=On ninja -t comdb etc more precise:)

I think that's a somewhat different scope than this patch.

MaskRay added inline comments.Jul 8 2019, 10:59 PM
clang/lib/Tooling/JSONCompilationDatabase.cpp
272

If you do this, it may not work with ccache gcc.exe. If ccache/gomacc/distcc cannot be used as the compiler driver, just remove the check of HasCompiler?

MaskRay added inline comments.Jul 8 2019, 11:01 PM
clang/lib/Tooling/JSONCompilationDatabase.cpp
292

This loop may be too much. One-shot unwrapCommand(Arguments) may just work

sammccall marked 4 inline comments as done.Jul 9 2019, 12:53 PM

Yes. It also lacks user settings. Downstream compile_commands.json consumers may provide their own command line option filtering mechanism that users can customize and add more compiler schedulers. For some tools, they probably don't want to see filtering applying on multiple layers, especially if the filtering mechanism done at the JSONCompilationDatabase layer may have false positives.

I've been careful here to be conservative, I'd be interested if there's potential for false positives.

Regarding redundant filtering: it's a plausible concern in the abstract.
None of the in-tree tools have layers accounting for these wrappers, though. And I don't think we really want only some tools handling obscure build configurations, so this seems to be the right layer.
Generally speaking, it's up to out-of-tree tools to avoid duplication with upstream APIs, rather than the other way around.

but a messy aspect of the nature of the ecosystem around compile_commands.json.

I'll appreciate it if you propose something to make cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=On ninja -t comdb etc more precise:)

The gn/cmake features Petr described sound good to me. For ninja, given its abstraction level I think solving it there is actually worse than solving it here.

clang/lib/Tooling/JSONCompilationDatabase.cpp
272

If you do this, it may not work with ccache gcc.exe

Right! Fixed this so both ccache and gcc can have an exe extension.

If ccache/gomacc/distcc cannot be used as the compiler driver, just remove the check of HasCompiler?

They can. There are three modes:

  • ccache gcc file.c (the mode we're trying to match)
  • ccache file.c (the mode we're trying to avoid matching)
  • g++ file.c (with g++ symlinked to ccache - we don't even notice)

Rewrote the comment to explain these cases more clearly.

292

Using ccache and distcc together is a common pattern.

(Is there a specific problem? Obviously it is possible to leave this out, but I can't see how it could cause problems that wouldn't occur with a single wrapper)

sammccall updated this revision to Diff 208786.Jul 9 2019, 12:54 PM
sammccall marked 2 inline comments as done.

Handle .exe, expand comments.

MaskRay added inline comments.Jul 9 2019, 8:23 PM
clang/lib/Tooling/JSONCompilationDatabase.cpp
277

Is this possible?

% ccache a.c
ccache: error: Could not find compiler "a.c" in PATH
292

Probably fine if Wrapper == "distcc" || Wrapper == "gomacc" || Wrapper == "ccache" is used above.

sammccall updated this revision to Diff 208942.Jul 10 2019, 6:42 AM

tweak comment: ccache doesn't support the second invocation mode

sammccall marked 3 inline comments as done.Jul 10 2019, 6:44 AM
sammccall added inline comments.
clang/lib/Tooling/JSONCompilationDatabase.cpp
277

Oops - it's possible for distcc and gomacc, but ccache doesn't support it.

I updated the comment, though I don't think it's worth branching the behavior. Worst case is that we have some false negatives where ccache doesn't get unwrapped, but I think they won't happen in practice (otherwise distcc and gomacc would have FAQ/workarounds for these).

MaskRay accepted this revision.EditedJul 12 2019, 2:23 AM

Using ccache and distcc together is a common pattern.

My preference would be just allow one single compiler launcher, i.e. distcc ccache gcc is not allowed, and don't think of gomacc a.cc (it is possible but we don't necessarily support all its use cases). Making distcc/ccache/gomacc gcc/gcc.exe a.cc work out-of-the-box should be good enough. distcc ccache gcc a.cc users are encouraged to set up correct cmake/other build system options. Then

while (unwrapCommand(Arguments));

becomes

unwrapCommand(Arguments);

These commands can be freely used as extensionless header files: #include <distcc>. That said, I don't have objection to this approach.

This revision is now accepted and ready to land.Jul 12 2019, 2:23 AM
sammccall marked an inline comment as done.Jul 12 2019, 2:59 AM

These commands can be freely used as extensionless header files: #include <distcc>. That said, I don't have objection to this approach.

I think you're talking about a command like ccache distcc -Xclang -emit-pch where ccache is the name of a header file.

This seems implausible:

  • JSON compilation databases almost never contain header commands
  • if there's no extension on the file, clang requires -x c-header etc before the filename AFAICT
  • such commands don't work with distcc or gomacc as the wrapper, because they do the same heuristic detection and will detect the filename is actually the compiler. (Though it may work with ccache)
  • extensionless source files are almost only used in the standard library, which doesn't have these names *and* won't appear in compile_commands.json

It also doesn't seem to be related to looping: it's a false positive that could occur on any iteration including the first.

So it's a minor point, but I'm inclined to have the loop here

This revision was automatically updated to reflect the committed changes.