This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Header guard check can skip past license comment
Needs ReviewPublic

Authored by njames93 on Jul 6 2020, 6:35 AM.

Details

Summary

Add an option for header guard derived checks to control whether to skip past a license comment when adding a guard to file currently missing one.
Whats identified as a license comment is a block of comments right at the start of a file that is followed by an empty line.

All these examples are being interpreted as the first tokens in a file.

// This is identified as a license comment.
// It can span multiple lines too.

// This is not part of the license comment as its detached.
// This is not identified as a license comment as the
// block is followed by code.
void foo();
// This is identified as a license comment as its followed by
// an empty line

void foo();

Diff Detail

Event Timeline

njames93 created this revision.Jul 6 2020, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 6 2020, 6:35 AM
njames93 marked an inline comment as done.Jul 6 2020, 7:10 AM
njames93 added inline comments.
clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
313–316

clang-format(trunk) says what I submitted was correct, clang-format-10 is suggesting the change. Which do I trust??

njames93 updated this revision to Diff 277348.Jul 13 2020, 2:09 AM

Added /**/ style license comment test cases

 // This is not identified as a license comment as the
// block is followed by code.
void foo();

FWIW: https://github.com/GrammaTech/gtirb-pprinter/blob/master/include/gtirb_pprinter/AttPrettyPrinter.hpp or https://github.com/GrammaTech/gtirb/blob/master/include/gtirb/AuxData.hpp (so there are projects which do not put a newline between the license and code).

clang-tools-extra/clang-tidy/ClangTidyCheck.h
48 ↗(On Diff #277348)

It makes me sad how much we're having to sprinkle std::move() calls around (I find the calls to move() to be more distracting that declaring the parameter types to be const std::string&.) Were these found mechanically?

clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
313–316

Either is defensible, I'm fine with using what trunk says.

clang-tools-extra/unittests/clang-tidy/LLVMModuleTest.cpp
12

Unrelated to this patch, but neat to know that there is zero testing of this on Windows.

224

You seem to be missing tests that show the license and header guards are both correct and found no issues.

njames93 marked 2 inline comments as done.Jul 13 2020, 6:34 AM
 // This is not identified as a license comment as the
// block is followed by code.
void foo();

FWIW: https://github.com/GrammaTech/gtirb-pprinter/blob/master/include/gtirb_pprinter/AttPrettyPrinter.hpp or https://github.com/GrammaTech/gtirb/blob/master/include/gtirb/AuxData.hpp (so there are projects which do not put a newline between the license and code).

Short of creating an AI that understands context it won't be possible to determine the difference between license and general documentation, in any case I feel this heuristic is the safest way to ensure good coverage with minimised risk of inserting the guard in the middle of documentation,

clang-tools-extra/clang-tidy/ClangTidyCheck.h
48 ↗(On Diff #277348)

These are me screwing around with my branches and will be removed

clang-tools-extra/unittests/clang-tidy/LLVMModuleTest.cpp
224

That wouldn't be testing new behaviour added here. The check can already find header guards when there is a license, The code I have added only affects when no guard is found and it needs to add one. I can still add those cases for piece of mind.

njames93 updated this revision to Diff 277411.Jul 13 2020, 6:45 AM
  • Added expect no changes test
 // This is not identified as a license comment as the
// block is followed by code.
void foo();

FWIW: https://github.com/GrammaTech/gtirb-pprinter/blob/master/include/gtirb_pprinter/AttPrettyPrinter.hpp or https://github.com/GrammaTech/gtirb/blob/master/include/gtirb/AuxData.hpp (so there are projects which do not put a newline between the license and code).

Short of creating an AI that understands context it won't be possible to determine the difference between license and general documentation, in any case I feel this heuristic is the safest way to ensure good coverage with minimised risk of inserting the guard in the middle of documentation,

My instinct is that we shouldn't be trying to play those games in the first place and should consider *all* leading comments and empty (whitespace-only) lines as part of the "license" and expect the first significant token to be the header guard. e.g., this isn't about the license at all, it's about whether you can have prose before the header guard or not. It's not uncommon for projects to put prose before header guards, nor is it uncommon for it to go after the header guards. tbh, that feels a bit like an option for the feature rather than an automatic behavior because I could also see a project wanting to enforce a consistent style.

clang-tools-extra/unittests/clang-tidy/LLVMModuleTest.cpp
224

I'd appreciate the test coverage because it's not always immediately clear why some of the new tests are expected to fail.

njames93 marked 3 inline comments as done.Jul 17 2020, 9:34 AM
 // This is not identified as a license comment as the
// block is followed by code.
void foo();

FWIW: https://github.com/GrammaTech/gtirb-pprinter/blob/master/include/gtirb_pprinter/AttPrettyPrinter.hpp or https://github.com/GrammaTech/gtirb/blob/master/include/gtirb/AuxData.hpp (so there are projects which do not put a newline between the license and code).

Short of creating an AI that understands context it won't be possible to determine the difference between license and general documentation, in any case I feel this heuristic is the safest way to ensure good coverage with minimised risk of inserting the guard in the middle of documentation,

My instinct is that we shouldn't be trying to play those games in the first place and should consider *all* leading comments and empty (whitespace-only) lines as part of the "license" and expect the first significant token to be the header guard. e.g., this isn't about the license at all, it's about whether you can have prose before the header guard or not. It's not uncommon for projects to put prose before header guards, nor is it uncommon for it to go after the header guards. tbh, that feels a bit like an option for the feature rather than an automatic behavior because I could also see a project wanting to enforce a consistent style.

A drive-by note: I'm also for treating all leading comments equally (and referring to them as just "comments", not "license comments").