This is an archive of the discontinued LLVM Phabricator instance.

[Tooling] A CompilationDatabase wrapper that infers header commands.
ClosedPublic

Authored by sammccall on Mar 28 2018, 4:15 PM.

Details

Summary

The wrapper finds the closest matching compile command using filename heuristics and makes minimal tweaks so it can be used with the header.

This isn't hooked up to anything by default, we could consider automatically wrapping commands that tools get from the autodetectFrom* methods. But my immediate plan is to enable it in clangd: D45007.

Diff Detail

Repository
rL LLVM

Event Timeline

sammccall created this revision.Mar 28 2018, 4:15 PM
sammccall updated this revision to Diff 140160.Mar 28 2018, 4:49 PM

Handle the case where no points are awarded (e.g. system headers) and document awards better.

sammccall updated this revision to Diff 140349.Mar 29 2018, 3:30 PM

Add tests, fix minor bugs, add lots of comments and structure.

sammccall edited the summary of this revision. (Show Details)Mar 29 2018, 3:35 PM
sammccall added a reviewer: ilya-biryukov.
sammccall added a subscriber: bkramer.
ilya-biryukov added inline comments.Apr 3 2018, 3:34 AM
lib/Tooling/InterpolatingCompilationDatabase.cpp
26 ↗(On Diff #140349)

Maybe add a comment describing the use-cases we had in mind while designing these heuristics?

  • .cpp and .h files usually have the same (or slightly modified) name, usually the prefix match,
  • LLVM (and other) codebases that put .h and .cpp files into different directories,
  • even random matches are better than arbitrary defaults,
  • ...
83 ↗(On Diff #140349)

Summarizing the offline discussion, we could exclude suffix matches from the initial version. This would make the code much simpler, and it seems most C++ projects we know of would actually work with prefix-only matches.

115 ↗(On Diff #140349)

This class seems to do two somewhat orthogonal things:

  • build and query the index structure for the paths,
  • handle queries to inner CDB and patch the compile commands for other files accordingly.

Maybe we could extract the code that handles the index into a separate class?

141 ↗(On Diff #140349)

Maybe store lower-cased paths in the index and compare case-insensitively when querying?
Having slight case mismatches is not uncommon and case-sensitivity shouldn't ever be the defining factor for this kind of heuristics.

147 ↗(On Diff #140349)

Same as prev. comment, maybe comment on why 4 was chosen here? Maybe use a named constant?

187 ↗(On Diff #140349)

Maybe add a comment why 2 is chosen here? Also, maybe use a named constant?

sammccall updated this revision to Diff 140791.Apr 3 2018, 7:54 AM
sammccall marked 6 inline comments as done.

Address review comments.

sammccall added inline comments.Apr 3 2018, 7:54 AM
lib/Tooling/InterpolatingCompilationDatabase.cpp
115 ↗(On Diff #140349)

I split out FilenameIndex, which just does the filename->filename mapping. The CompilationDatabase implementation now only contains the interface methods and adjust().

sammccall updated this revision to Diff 140792.Apr 3 2018, 7:55 AM

clang-format

sammccall updated this revision to Diff 140802.Apr 3 2018, 8:22 AM

Add more tests, and only add -x when actually needed.

ilya-biryukov added inline comments.Apr 4 2018, 3:35 AM
lib/Tooling/InterpolatingCompilationDatabase.cpp
96 ↗(On Diff #140802)

OriginalPaths is empty at this point, did we intend to sort Filenames instead?

99 ↗(On Diff #140802)

Given that we do an extra allocation when using lower() anyway, an extra copy into Strings is redundant.
Do we really need the arena at this point? It adds extra copies that we might not want.
It might give better memory locality, though, so it may be faster to use it overall, so up to you.

224 ↗(On Diff #140802)

Maybe add an assertion that Idx is non-empty?

308 ↗(On Diff #140802)

It feels like this heuristics only works for headers and files without extension (i.e. probably also headers).
E.g., if we have a .cpp file and .c file, then trying to infer args for .c file from .cpp file is probably the wrong thing to do. And using Fortran flags for C or C++ is certainly the wrong thing to do.

It seems transferring flags between different languages is never fine, except for C/C++ headers. WDYT?

sammccall updated this revision to Diff 140941.Apr 4 2018, 4:56 AM
sammccall marked 3 inline comments as done.

Address comments, except cross-language flag transfer which needs more thought.

lib/Tooling/InterpolatingCompilationDatabase.cpp
308 ↗(On Diff #140802)

Urgh, you're right, this is dubious. But I think your suggestion is too narrow:

  • transferring flags between *.m, *.mm, and *.h seems fine (mixed m and mm isn't that uncommon I think). -x should be set on the headers but not on the m or mm files.
  • transferring between *.c and *.cc doesn't seem always wrong. Many -W, -I, and -D flags are shared (aren't these the most important ones?). Clearly adding -x is bad in this case.
  • yeah, fortran... we should drop those, but I'd wait for a report. compile_commands is a clang "standard" afaik, so putting fortran there doesn't make sense unless the build system doesn't know about languages.
  • also if we hard-ban some candidates, we no longer have the guarantee that we can pick a best candidate, which adds complexity

So I'd suggest:

  • in addition to the "implied language changed" guard, only add -x to certain languages
  • maybe we should reward same-language somehow. This is tricky, because if there's a compile command for one header, it might be quite unusual. Also there'll be *lots* of matches. Not sure how to best do this.
sammccall updated this revision to Diff 141172.Apr 5 2018, 9:33 AM

Address cross-language issues in a more comprehensive way.
Prefers (potentially) same-language over cross-language, handles cross-language
flag transfers correctly using -x and handling (dropping) -std.

sammccall marked 2 inline comments as done.Apr 5 2018, 9:34 AM
ilya-biryukov accepted this revision.Apr 9 2018, 2:35 AM

LGTM with a small nit.
Really excited about this landing!

lib/Tooling/InterpolatingCompilationDatabase.cpp
353 ↗(On Diff #141172)

Maybe put these fields into struct Candidate {}?
The code would, arguably, be easier to read. Up to you.

This revision is now accepted and ready to land.Apr 9 2018, 2:35 AM
This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.
sammccall marked an inline comment as done.