This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add check to detect external definitions with no header declaration
Needs ReviewPublic

Authored by njames93 on Jan 25 2020, 4:19 AM.

Details

Summary

Flags variables and functions with external linkage that don't have a declaration in the corresponding header file.
Addresses Feature request: possible check for definition with external linkage without corresponding declaration in a header.

This is the output from running against clang/lib with CheckAnyHeader enabled

Diff Detail

Event Timeline

njames93 created this revision.Jan 25 2020, 4:19 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 25 2020, 4:19 AM
njames93 edited the summary of this revision. (Show Details)Jan 25 2020, 4:22 AM
njames93 added a project: Restricted Project.
njames93 updated this revision to Diff 240371.Jan 25 2020, 4:26 AM
  • Remove unnecessary duplicated code

Unit tests: fail. 62128 tests passed, 5 failed and 807 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Unit tests: fail. 62128 tests passed, 5 failed and 807 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: fail. Please format your changes with clang-format by running git-clang-format HEAD^ or applying this patch.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

This is what gets flagged when ran on clang/lib with CheckAnyHeader enabled (with it disabled it just goes crazy as a lot of clang headers files seem to share a source file for implementation)
clang/lib

njames93 updated this revision to Diff 240372.Jan 25 2020, 5:01 AM
  • fix formatting

Unit tests: fail. 62128 tests passed, 5 failed and 807 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

njames93 updated this revision to Diff 240373.Jan 25 2020, 6:11 AM
  • Ignores implicit template instantiations
Eugene.Zelenko added inline comments.
clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
15

Please also include StringRef.

clang-tools-extra/test/clang-tidy/checkers/Inputs/misc-missing-header-file-declaration/wrong_header.h
8

Please add newline.

clang-tools-extra/test/clang-tidy/checkers/misc-missing-header-file-declaration-any-header.cpp
80

Please add newline.

Unit tests: fail. 62128 tests passed, 5 failed and 807 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

njames93 updated this revision to Diff 240375.Jan 25 2020, 6:36 AM
njames93 marked 3 inline comments as done.
  • Address nits

Unit tests: fail. 62128 tests passed, 5 failed and 807 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

njames93 edited the summary of this revision. (Show Details)Jan 25 2020, 7:22 AM
njames93 updated this revision to Diff 240387.Jan 25 2020, 9:34 AM
  • Warn extern declarations in source files

Unit tests: fail. 62128 tests passed, 5 failed and 807 were skipped.

failed: libc++.std/language_support/cmp/cmp_partialord/partialord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongeq/cmp.strongeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_strongord/strongord.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakeq/cmp.weakeq.pass.cpp
failed: libc++.std/language_support/cmp/cmp_weakord/weakord.pass.cpp

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

Thank you for the contribution! I didn't review the code thoroughly yet, only the tests.

clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
30

I think we should consider any file other than the main file to be a header (anything brought in by #include). Projects have lots of different convention -- for example, LLVM uses a .inc extension for some generated files and x-macro files. The standard library does not have any extensions at all (and I'm sure some projects imitate that) etc.

76

This heuristic should be a lot more complex. In practice people have more complex naming conventions (for example, in Clang, Sema.h corresponds to many files named SemaSomethingSomething.cpp).

I think there is a high chance that checking only a header with a matching name will satisfy too few projects to be worth implementing.

On the other hand, if you could use C++ or Clang modules to narrow down the list of headers where the declaration should appear, that would be a much stronger signal.

clang-tools-extra/docs/clang-tidy/checks/misc-missing-header-file-declaration.rst
15

Unclear what happens when CheckAnyHeader=0.

clang-tools-extra/test/clang-tidy/checkers/Inputs/misc-missing-header-file-declaration/misc-missing-header-file-declaration.cpp.tmp.h
23

Need a test for a template defined in a header (shouldn't produce a warning).

Need a test for an inline function defined in a header (also no warning).

Need a test for a static function defined in a header (should produce no warning, but it is bad for other reasons, however I think they are out of scope for this checker).

Need a test for an anonymous namespace in a header (no warning).

clang-tools-extra/test/clang-tidy/checkers/misc-missing-header-file-declaration.cpp
2

Need lots of tests for classes, out-of-line member declarations, member templates, out-of-line member templates, class templates, variable templates etc.

81

Need a test for a template in an anonymous namespace.

njames93 marked 2 inline comments as done.Jan 27 2020, 2:58 AM

I'll work up those other test cases

clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
30

I agree in part, but there is one case that I didn't want to happen and that is when clang-tidy is ran as a text editor plugin. Often it will blindly treat the current open file as the main file which would lead to a lot of false positives, Would a better metric being matching on file extensions not corresponding to source files (c, cc, cxx, cpp)?

76

That is the reason I added the CheckAnyHeader option. For small projects the matching name would be usually be enough, but for big complex projects there is no feasible check that would work. Falling back to making sure every external definition has a declaration in at least one header will always be a good warning

gribozavr2 added inline comments.Jan 27 2020, 4:02 AM
clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
30

I agree in part, but there is one case that I didn't want to happen and that is when clang-tidy is ran as a text editor plugin.

Does that really happen? That sounds like broken editor integration to me, because this is not how C++ without modules works...

Would a better metric being matching on file extensions not corresponding to source files (c, cc, cxx, cpp)?

Yes, I think that would be better, if we have to sniff file extensions.

76

That's the thing -- due to the lack of consistency in projects and style guides for C++, I think most projects will have to turn on CheckAnyHeader. We get implementation complexity in ClangTidy, false positives in high-profile projects, force users to configure ClangTidy to work well for their projects (unless we set CheckAnyHeader=1 by default... which then nobody would flip back to zero), and little benefit for end users.

njames93 marked 2 inline comments as done.Jan 27 2020, 5:07 AM
njames93 added inline comments.
clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
30

Well it depends on the editor and language server, I know that if you have a valid compile commands clangd will look for the best source file to compile against. But I feel that there would be some basic editors which will blindly run clang-tidy against the open file. There is also the case of header only libraries that people want to get code analysis on, not sure if they maybe create a proxy file to include the header though.

76

Would you say the best way would be check if the source file name starts with the header file name by default. Or is that very specific to Clang?

/// <SomeHeaderImpl.cpp>
#include "SomeHeader.h"

This file would check for declarations in SomeHeader.h

Or maybe check if the header file name starts with the source file?

/// <SomeHeader.cpp>
#include "SomeHeaderImpl.h"

This file would check for declarations in SomeHeaderImpl.h.
Or even check both?

gribozavr2 added inline comments.Jan 27 2020, 6:57 AM
clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
76

Would you say the best way would be check if the source file name starts with the header file name by default. Or is that very specific to Clang?

I don't think it is too specific, it should cover many projects I think.

Or maybe check if the header file name starts with the source file?

Seems like an unusual case to me, but I'm willing to be convinced otherwise.

njames93 updated this revision to Diff 240611.Jan 27 2020, 8:49 AM
njames93 marked 6 inline comments as done.
  • Tweaked default corresponding header settings
njames93 added inline comments.Jan 27 2020, 8:54 AM
clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
76

I thought it was an unusual case and wasn't thinking it would be a good idea to add. I'll just set it up so that it will always look in header files whose names appear at the start of the main source file.

Unit tests: pass. 62197 tests passed, 0 failed and 815 were skipped.

clang-tidy: fail. clang-tidy found 0 errors and 1 warnings. 0 of them are added as review comments below (why?).

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

How is this different from -Wmissing-prototypes?

How is this different from -Wmissing-prototypes?

This checks variables too, and it looks for a prototype specifically in the header files. missing-prototypes just ensures there is a prototype of a function before its defined. It also checks based on linkage of the function/variable

How is this different from -Wmissing-prototypes?

This checks variables too, and it looks for a prototype specifically in the header files. missing-prototypes just ensures there is a prototype of a function before its defined. It also checks based on linkage of the function/variable

I observed a lot of cases when people cheated GCC -Wmissing-prototypes by adding prototypes in source file instead of header.

njames93 marked an inline comment as done.Jan 27 2020, 4:45 PM

Thanks so much for all the effort on this. I think it's really wonderful.

I've added a couple of comments elsewhere.

My other query: does/should this check cover types? Eg does/should it fire on a class definition in a .cpp file that isn't given internal-linkage? I'm inclined to say it should.

clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
76

FWIW, my instinct is that there are two separate questions:

  1. What sort of files should have their external-linkage definitions checked?
  2. What sort of included files should be part of the search for adequate declarations that match those definitions?

...and that my answers, in reverse order, are:

  1. All included files should be included (ie a check for a given external-linkage definition should be appeased by _any_ matching declaration in _any_ included file)
  2. Only main files. And (to prevent lots of false-positives when the tool is run over headers), only "c", "cc", "cxx", "cpp" files by default but with an option to check in _all_ main files.

I think this has the merits that it:

  • allows users to put their valid declaration in files with any extension they like
  • defaults to only firing when run against c/cc/cxx/cpp files but provides the option to fire when run against a file with _any_ extension

So I propose that there be one option and it be as follows:

.. option:: CheckExtDefnsInAllMainFiles

If set to 0 only check external-linkage definitions in main files with typical source-file extensions ("c", "cc", "cxx", "cpp").
If set to 1 check external linkage definitions in all main files.
Default value is 0.

clang-tools-extra/docs/clang-tidy/checks/misc-missing-header-file-declaration.rst
9

I think it would be worth an extra comment here to explain to users why this situation is fishy and what they should do about it. Eg along the lines of:

Giving a symbol external linkage indicates that you intend that symbol to be usable within other translation units, suggesting there should be a declaration of the symbol in a header. No such symbol was found in this translation unit.

If you want the symbol to be accessible from other translation units, add a declaration of it to a suitable header file included as part of this translation unit.

Otherwise, give the definition internal linkage, using static or an anonymous namespace.

If you think the symbol is already declared in a file that's included as part of this translation unit, check for any mismatches between the declaration and definition, including the namespace, spelling and any arguments.

njames93 marked 2 inline comments as done.Jan 31 2020, 8:43 AM

Thanks so much for all the effort on this. I think it's really wonderful.

I've added a couple of comments elsewhere.

My other query: does/should this check cover types? Eg does/should it fire on a class definition in a .cpp file that isn't given internal-linkage? I'm inclined to say it should.

Support for checking types should be either opt-in(or opt-out) but never forced. I feel like it would generate too much noise.

clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
76

So current behaviour is kind of explained that you can either search in headers that are a starting substring of the source file or just search for all headers, I tried to explain it in the documentation. By default it only checks for definitions in files with the extension c, cc, cpp or cxx. I could add an option to check all definitions expanded in the main file, regardless of the extension.

clang-tools-extra/docs/clang-tidy/checks/misc-missing-header-file-declaration.rst
9

I may copy that exact documentation. The current documentation is definitely needing some improvement.

Support for checking types should be either opt-in(or opt-out) but never forced. I feel like it would generate too much noise.

Okey dokes. That option could always be added as another feature in the future. Thanks very much for all work on this.

clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
76

Yep - I think I understand the current behaviour (but could well be wrong about that of course :) ).

But IIUC, I share @gribozavr2's concerns that the current behaviour's default of only accepting matching declarations in included files that match a specific pattern is going to trigger a lot of false-positives in a lot of projects. And I don't think this check should care about where the matching declaration is found, so long as there is such a declaration in a file included within the TU.

To my thinking, the implication of having a default policy that restricts the names of files in which we'll accept matching declarations is that we're policing the way that users name and organise their files (by spamming them with lots of false-positives if they don't do it in a way that satisfies our rule). I'd argue that file naming/organisation doesn't belong as part of this check (or even as part of any other clang-tidy check).

The other half of my previous comment was about a CheckExtDefnsInAllMainFiles option but I think that that's less important and that having a restriction to c, cc, cpp and cxx seems pretty reasonable to me—it could always be broadened in the future.

Overall, I think it's better to err on the side of not triggering the warning so as to avoid excessive false-positives; the check can always be made more sensitive in the future. I see the implications of that stance as being:

  • default to only checking the definitions in files that look like source files (so I think restricting to c, cc, cpp and cxx is good)
  • default to accepting matching declarations in all included files.
clang-tools-extra/docs/clang-tidy/checks/misc-missing-header-file-declaration.rst
9

Great.

njames93 marked an inline comment as done.Feb 1 2020, 1:43 AM
njames93 added inline comments.
clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
76

The reason for the unrelated header check is to prevent potential bugs. If a declaration is in a completely unrelated header that happens to get included, it more than likely could be a collision which could then break ODR rules during linking. Perhaps a better thing to do would be if the decl isn't in a "corresponding" header, emit a warning, but say there was a decl found in example.h

tonyelewis added inline comments.Feb 3 2020, 2:33 AM
clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
76

Thanks.

When you say "warning", do you mean something different from a check violation? Does that mean the check isn't fired but a warning message is printed?

Either way, my feeling is that it would be better to avoid diagnosing this situation (of a definition in a source-file matching a declaration in a source file with a filename stem that isn't a leading substring of the source file's name) at all.

One minor aspect: if this ODR-motivated diagnostic message is tied to the misc-missing-header-file-declaration check, that has the potential to be confusing to users. The existing check is about one specific issue ("you said you want X to be accessible in multiple TUs but you haven't provided a declaration for other TUs to share") which IMHO would be muddied by being conflated with a quite different issue ("the names of the files containing the definition and declaration of X are quite different, which we think smells a bit like an ODR violation").

Another issue: it feels like an inappropriate thing to be enforcing. C++ gives users complete freedom about the way they name their source-code files (indeed I don't think the standard even assumes source-code comes in files) and I don't think there's anything like universally agreed upon best-practice that deviating from my_stuff.cpp / my_stuff.hpp pairs is always bad. So it feels wrong to start making life difficult for users who deviate from that pattern for any reason.

And I think the rest of my concern relates to two empirical questions:

  • what fraction of the all such instances in the world's source code are actually OK (not an ODR violation)?
  • what fraction of projects contain such instances?

If I was convinced these numbers were both really low, I'd feel much more comfortable with clang-tidy firing warnings/check-violations. In that world, we could say: even though it's not wrong to deviate from matching my_stuff.cpp / my_stuff.hpp pairs, it's worth complete adherence to that pattern for the resultant ODR-violation-detection gains. (Side note: another empirical question is how many ODR-violations this check would detect that aren't already caught by other ODR-violation detecting tools, eg linkers' and ASan's)

But my instinct is that those fractions are both pretty high: that lots of projects involve some file-naming structures that would fall afoul of the rule and cause lots of false positives.

In other words, my instinct is that this pattern isn't very strong evidence of an ODR violation and that the line-noise/inconvenience cost would far outweigh the ODR-detecting benefit.

So I think that the check should always accept matching declarations in any header (or at least that that should be the default policy and that users explicitly opt in to the stricter policy).

Again: thanks for all the work on this check - it's really great and I really appreciate it.

njames93 updated this revision to Diff 243545.Feb 10 2020, 6:41 AM
  • Relaxed corresponding header
  • Added support for tag types
njames93 marked 2 inline comments as done.Feb 10 2020, 6:42 AM

Thanks very much.

njames93 updated this revision to Diff 308121.Nov 27 2020, 3:32 PM

I thought I'd bring this back up.

  • Rebased and tweaked to use a few new library features in clang-tidy since when this was first proposed.
  • It definitely has value but still maybe got a little bit before its ready to land.
  • Right now it will flag template definitions in source files, Is this good or bad behaviour?
  • It won't flag instantiations or explicit specializations, though there should be some better tests for that.
  • An option has been added to specify ImplementationFileExtensions. The check will only run if the main file matches those extensions. This is useful for editor workflows(like clangd) which run clang-tidy over a header file.

Running it under my local clangd instance has already found quite a few true positives as I go, (Probably should push a patch to fix those.)

njames93 updated this revision to Diff 308122.Nov 27 2020, 3:35 PM

Fix failing test case.

njames93 marked 2 inline comments as done.Nov 27 2020, 4:18 PM

Thank you very much for coming back to this and for bringing the changes up to date. I still think this would be really valuable.

Right now it will flag template definitions in source files, Is this good or bad behaviour?

Do you mean that both of the following would be flagged?

// a.cpp
template <typename> void f() {}
template <typename> class C {};

I'm not sure how I feel about that. It feels to me like a less clear signal that the implementation file is mistakenly out of sync with the headers. Also, I think it's better to err on the side of being conservative about what we flag at first, knowing we can always consider flagging more later on. But I'm not well enough informed to have strong feelings.

njames93 updated this revision to Diff 308165.Nov 28 2020, 8:03 AM

Removed warnings on template definitions.
Added some basic fix-it support, need some lexer magic to create the fix-it for declarations declared with extern.

njames93 updated this revision to Diff 308176.Nov 28 2020, 9:46 AM

Added fixit to remove extern.

@njames93 Thank you very much for your continued input on this. Is your ping for an automated process or a human? If the latter, is it for me or for the reviewers?

aaron.ballman added inline comments.Jan 4 2021, 12:05 PM
clang-tools-extra/clang-tidy/misc/MissingHeaderFileDeclarationCheck.cpp
35
85
88
163

clang-tidy diagnostics do not start with a capital letter, so these string literals should all be lowercase.

191

Should this be using configurationDiag() instead of llvm::errs()?

233–235

These string literals should also be lowercased.

254
clang-tools-extra/test/clang-tidy/checkers/misc-missing-header-file-declaration.cpp
13

Warning on this construct worries me -- it's not uncommon to use extern declarations in some language modes. For instance, in C, you'll still find plenty of older code bases that use an extern declaration of a system function rather than #include'ing the standard library header. Also, folks will use extern definitions as a way to smuggle data between TUs without exposing a public interface (not always the best of practices, but it is an approach to hiding implementation details).