This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Added diagnostics about incorrect usage of NOLINT comment
Needs RevisionPublic

Authored by xgsa on Dec 16 2017, 2:38 PM.

Details

Summary

As discussed in the previous review [1], diagnostics about incorrect usage of NOLINT comment was added, i.e..:

  • usage of NOLINT with unknown check name;
  • usage of NOLINT for line, where there is no diagnostics;
  • usage of NOLINT without closing parenthesis.

I have covered the implementation with tests, but I haven't updated the documentation yet, because I'd like to approve the implemented approach in general. If nobody insists, I'd prefer updating the documentation in follow-up patches, because this will make the patch even bigger and the review longer.

[1] - https://reviews.llvm.org/D40671

Diff Detail

Event Timeline

xgsa created this revision.Dec 16 2017, 2:38 PM
xgsa updated this revision to Diff 127260.Dec 16 2017, 2:55 PM

A few minor coding style fixes.

Eugene.Zelenko added inline comments.
clang-tidy/ClangTidyDiagnosticConsumer.cpp
172

Please separate with empty line.

173

Please include header for SmallVector.h.

328–329

Please use default members initialization for DiagEngine and Profile.

xgsa marked 3 inline comments as done.Dec 16 2017, 11:58 PM
xgsa added inline comments.
clang-tidy/ClangTidyDiagnosticConsumer.cpp
328–329

Actually, it is not necessary, as unique_ptr is initialized with nullptr by default. I have just removed initializing them here.

xgsa marked an inline comment as done.Dec 16 2017, 11:58 PM
xgsa updated this revision to Diff 127267.Dec 17 2017, 12:00 AM

Review comments were applied.

xgsa added inline comments.Dec 17 2017, 12:12 AM
clang-tidy/ClangTidyDiagnosticConsumer.cpp
328–329

Sorry, you are right, just ignore my previous comment.

xgsa updated this revision to Diff 127268.Dec 17 2017, 12:21 AM

Review comments applied.

JVApen added a subscriber: JVApen.Dec 17 2017, 12:57 AM

I'm missing some documentation to understand the corner cases. How does this check behave with suppressed warnings for checks which ain't currently checked. (Using -no-... on a code base or suppressing the warnings via the pragmas)

xgsa added a comment.Dec 17 2017, 1:58 AM

I'm missing some documentation to understand the corner cases. How does this check behave with suppressed warnings for checks which ain't currently checked. (Using -no-... on a code base or suppressing the warnings via the pragmas)

The check just gathers NOLINT comments and compares them with a list of reported diagnostics for the current configuration. Thus, it will report diagnostics on the NOLINT comments, which are added for the checks, which are disabled in current configuration (e.g. with option -check=-some_check).

Note also that in spite NOLINT for clang warnings now works for clang-tidy, it doesn't work for clang itself, so I wouldn't recommended using something like "// NOLINT(clang-diagnostic-unused-variable)". As you mentioned, pragmas or -no-... options should be used instead. The check doesn't perform any processing of those things.

aaron.ballman added inline comments.Dec 22 2017, 6:38 AM
clang-tidy/ClangTidyDiagnosticConsumer.cpp
23–24

Please keep the includes in sorted order.

134

We don't typically mark local, non-pointer types as const.

163

The & should bind to the right. However, instead of returning the concrete container, why not expose a range interface instead?

265

Spurious space after operator(). i does not match our coding conventions (and could use a better name).

270–271

The parameter names don't match our coding conventions.

277

Mark the constructor explicit?

280

Is there a better LLVM ADT to use here?

320

I'd reword this slightly to: "unknown check name '%0' specified in %1 directive"

329–330

How about: "unbalanced parentheses in %0 comment; all diagnostics silenced for this line" ?

369

Name does not match coding standard.

412

Making the cache volatile will have no impact on this.

Any reason not to make the cache mutable, however? That's quite common for implementation details.

485

Formatting.

556

s/change/chance?

Also: if diagnostics is filtered -> if diagnostics are filtered

829–830

I don't think the user is going to care about the distinction between no diagnostics being triggered and the expected diagnostic not being triggered. Also, it's dangerous to claim the lint comment is redundant because it's possible the user has NOLINT(foo, bar) and while foo is not triggered, bar still is. The NOLINT comment itself isn't redundant, it's that the check specified doesn't occur.

I would consolidate those scenarios into a single diagnostic: "expected diagnostic '%0' not generated" and "expected diagnostic '%0' not generated for the following line".

One concern I have with this functionality is: how should users silence a lint diagnostic that's target sensitive? e.g., a diagnostic that triggers based on the underlying type of size_t or the signedness of plain char. In that case, the diagnostic may trigger for some targets but not others, but on the targets where the diagnostic is not triggered, they now get a diagnostic they cannot silence. There should be a way to silence the "bad NOLINT" diagnostics.

clang-tidy/ClangTidyDiagnosticConsumer.h
29

Please add a newline after this.

146

This function can be marked const.

206

Comma after storeError()

299

Perhaps this would make more sense as an llvm::IndexedMap or other non-STL datatype?

xgsa marked 13 inline comments as done.Dec 23 2017, 3:01 PM

Aaron, thank you for your review and sorry for the coding convention mistakes -- I still cannot get used to the llvm coding convention, because it quite differs from the one I have been using in my projects.

clang-tidy/ClangTidyDiagnosticConsumer.cpp
280

This data structure provides the fast lookup by check name+line number and it's exactly what is necessary. What are the concerns about this data structure?

320

I'd used the word "comment" instead of "directive" for consistency with documentation and the other messages. The rest is applied.

412

Sorry, certainly, instead of "volatile" I meant "mutable".

Actually, using of "mutable" violates a constancy contract, as the field is get modified in a const method. Thus I'd tend to avoid using mutable, if possible, because e.g. in multi-threaded applications these fields require additional protection/synchronization. Moreover, I see that using of mutable is not very spread in clang-tidy. Thus as, currently, hasCheck is called from the non-constant context, I'd prefer leaving it non-const instead of making cache mutable. Please, let me know if you insist on the mutable option.

829–830

I don't think the user is going to care about the distinction between no diagnostics being triggered and the expected diagnostic not being triggered. Also, it's dangerous to claim the lint comment is redundant because it's possible the user has NOLINT(foo, bar) and while foo is not triggered, bar still is. The NOLINT comment itself isn't redundant, it's that the check specified doesn't occur.

I would consolidate those scenarios into a single diagnostic: "expected diagnostic '%0' not generated" and "expected diagnostic '%0' not generated for the following line".

This branch of if (NolintEntry.first.CheckName == NolintCommentsCollector::AnyCheck) reports only about NOLINT/NOLINTNEXTLINE comments without check list, so I suppose it's fair to claim that this comment is redundant (we have already checked that no single check reported diagnostics on the line). The else-branch reports the diagnostics for the definite check in a list in case of NOLINT(foo, bar) (actually, if neither foo nor bar checks reported diagnostics for the line, there will be a few diagnostics from nolint-usage - not sure if it's good, but it seems acceptable). That is why, I suppose, it is necessary to distinct these cases.

One concern I have with this functionality is: how should users silence a lint diagnostic that's target sensitive? e.g., a diagnostic that triggers based on the underlying type of size_t or the signedness of plain char. In that case, the diagnostic may trigger for some targets but not others, but on the targets where the diagnostic is not triggered, they now get a diagnostic they cannot silence. There should be a way to silence the "bad NOLINT" diagnostics.

There is such mechanism: it is possible to specify // NOLINT(nolint-usage) or //NOLINT(check1, check2, nolint-usage) to silence the nolint-usage`-mechanism. Please, see tests for details and more examples.

clang-tidy/ClangTidyDiagnosticConsumer.h
299

I think, the map of vectors fits well: it provides efficient adding, storing and iterating through elements. In spite items in vector can be duplicated, it's not a usual case (when there are a lot of diagnostics for the same check on the same line), so I don't think it worth using set. I don't see how llvm::IndexedMap could help here. What are the concerns about the current data structure?

xgsa updated this revision to Diff 128091.Dec 23 2017, 3:11 PM
xgsa marked an inline comment as done.

Review comments applied.

xgsa updated this revision to Diff 128092.Dec 23 2017, 3:17 PM

The full diff (but not only the incremental one) was uploaded. Please, skip previous revision. Sorry.

aaron.ballman added inline comments.Dec 24 2017, 8:50 AM
clang-tidy/ClangTidyDiagnosticConsumer.cpp
163

This could use a typedef to make the return type more readable. Also, the function should be checkName(), dropping the "get".

280

Same as above.

369

Please do not name the parameter the same as a type name.

412

Use of mutable does not violate constancy; the cache is not exposed via any interface; it is purely an implementation detail. Very little of our code base is concerned with multithreaded scenarios (we use bit-fields *everywhere*, for instance).

I won't insist on using mutable if you are set against it, but this is the exact scenario in which it is the correct solution.

829–830

Can you provide an example where this distinction will make a difference to the user and help clarify a confusing situation? I cannot think of one, and it would be nice to simplify this code.

Thank you for the explanation about nolint-usage. This is not terminology I've seen before -- is this your invention, or is it a documented feature for NOLINT comments?

clang-tidy/ClangTidyDiagnosticConsumer.h
299

STL data structures are not particularly well-optimized; they're meant for general usage cases. This map can get very large in real projects, so I worry about performance concerns. That's why we have specialized ADTs -- you should generally prefer them to STL containers.

xgsa marked 5 inline comments as done.Dec 25 2017, 6:46 AM
xgsa added inline comments.
clang-tidy/ClangTidyDiagnosticConsumer.cpp
280

I have reviewed llvm guide [1] and found that it recommends using ordered vector in such cases. I have implemented this approach, however I'd like to notice that lookup complexity in reportRedundantNolintComments() increased from average O(1) for std::unordered_map to O(log(N)) for binary search. However, memory usage become less. As this collection is gathered for each translation unit, I don't expect millions of NOLINT comments in it, thus this approach looks suitable.

[1] - http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task

412

I've just tried the mutalbe-approach and discovered one more issue: ClangTidyASTConsumerFactory requires non constant ClangTidyContext. Thus, if current approach is not suitable, either ClangTidyASTConsumerFactory should be refactored on const and nonconst parts or caching should be done in constructor (which makes fetching the names not lazy). Because of this, currently I am suggesting to leave hasCheck nonconstant.

829–830

Can you provide an example where this distinction will make a difference to the user and help clarify a confusing situation? I cannot think of one, and it would be nice to simplify this code.

Example for the diagnostics emitted in the if-branch:

class A2 { explicit A2(int i); }; // NOLINT
=> warning: there is no diagnostics on this line, the NOLINT comment is redundant

Thus, the whole NOLINT comment should be removed.

Example for the diagnostics emitted in the else-branch:

// Case: NO_LINT for the specific check on line with an error on another check.
class A4 { A4(int i); }; // NOLINT(misc-unused-parameters, google-explicit-constructor)
=> warning: there is no diagnostics for the 'misc-unused-parameters' check on this line, the NOLINT comment is redundant

In this case, only misc-unused-parameters check should be removed from the list, but not the whole NOLINT.

Note also that if in the last example there is only misc-unused-parameters in the NOLINT checks list, the diagnostics will be the same. However, I suppose it's acceptable, because even if a user removed only a check name from the check list without removing NOLINT itself, clang-tidy will suggest removing NOLINT itself on the next run.

Thank you for the explanation about nolint-usage. This is not terminology I've seen before -- is this your invention, or is it a documented feature for NOLINT comments?

It's my invention, I suppose, from the user point of view it's logical to consider this mechanism as a check.

clang-tidy/ClangTidyDiagnosticConsumer.h
299

I have reviewed llvm guide [1] and haven't found more suitable ADT, except std::map, which provides O(log(N)) complexity for lookup and insert against average O(1) for std::unordered_map. llvm::IndexedMap doesn't look suitable, as there could be a lot of lines in the file, whereas only a few will contain NOLINT comment, so llvm::IndexedMap won't store data efficiently.

Moreover, as far as I understand, this collection stores data for the translation unit and is cleaned up after it is processed, so there shouldn't be millions of diagnostics line numbers stored here.

Thus, I'd prefer leaving the container type as is to optimize for performance, but I would be also OK with std::map to optimize for memory usage.

[1] - http://llvm.org/docs/ProgrammersManual.html#picking-the-right-data-structure-for-a-task

xgsa updated this revision to Diff 128144.Dec 25 2017, 6:47 AM
xgsa marked an inline comment as done.

Review comments applied.

aaron.ballman added inline comments.Dec 27 2017, 1:00 PM
clang-tidy/ClangTidyDiagnosticConsumer.cpp
412

Ah, okay, that makes sense to me -- thank you for looking into it, I think how you have it is reasonable enough then.

829–830

Thank you for the examples, but I don't think they would add any clarification for the user, either. That said, I think we could collapse all four diagnostics into: "there are no diagnostics %select{|for '%2'}0 on %select{this|the next}1 line" where %0 distinguishes between NOLINT with and without an explicit check name, %1 distinguishes between NOLINT and NOLINTNEXTLINE, and %2 is the name of the check in question (if any).

Thank you for the explanation about nolint-usage. This is not terminology I've seen before -- is this your invention, or is it a documented feature for NOLINT comments?

It's my invention, I suppose, from the user point of view it's logical to consider this mechanism as a check.

I think we should devise a better way of expressing this unless there's industry practice suggesting a specific term or syntax. nolint-usage isn't very discoverable. Whatever we wind up with, it should work naturally for constructs like your first example. In the first example, diagnosing that NOLINT is redundant only makes sense in some circumstances, such as when the construct is in a source file that is not compiled with google-explicit-constructor enabled. However, if the code is in a shared header file where some (but not all) source code is checked with google-explicit-constructor enabled, it serves a purpose and removing it would be incorrect. The same holds for NOLINT(check).

Ultimately, I think that diagnosing NOLINT on a line that issues no diagnostics should not be enabled by default. It's highly likely that the NOLINT directive is due to reasons unknowable to the implementation, and suggesting the user remove the comment seems unhelpful. My example above is one plausible case where NOLINT may serve a purpose. Another example is code checked by clang-tidy as well as other tools; the NOLINT may be silencing the other tools. I think it should take extra work by the user to enable this functionality in a catch-all manner -- either via different syntax or enabled only through an explicit flag.

I think that diagnosing NOLINT(check-name) when that diagnostic is not triggered is useful by default because the user is explicitly expecting a diagnostic by name. However, I think the user needs an almost-trivial way to silence the "not triggered" diagnostic on a case-by-case basis. Target specific diagnostics as well as shared code are both examples of situations where the NOLINT comment may be accurate but for situations unknown to clang-tidy. Telling the user to remove the NOLINT comment would be the wrong thing to do.

clang-tidy/ClangTidyDiagnosticConsumer.h
299

Being single-TU doesn't really matter much for number of lines processed -- generated code is quite common. That said, we can leave it until performance concerns become a problem in practice.

xgsa added inline comments.Jan 2 2018, 2:20 AM
clang-tidy/ClangTidyDiagnosticConsumer.cpp
829–830

If to look at cpplint, which I suppose, NOLINT syntax was taken from, it also reports diagnostics for incorrect usage of NOLINT as messages in category 'readability/nolint' [1].

In spite of the fact, that nolint-usage is technically not a check, it mimics the behavior of a simple check: it is only enabled if nolint-usage is enabled via -check option (explicitly or via *) or configuration; it reports diagnostics for this category and allows this diagnostics to be suppressed via NOLINT. The only missing thing for now is documentation page for the nolint-usage check and I am going to work on this after this patch is approved. Thus, I don't think that this approach is not discoverable and it seems more natural than adding a separate command line option and a separate way of suppressing NOLINT diagnostics.

I agree that this diagnostics will only be useful if NOLINT comments are carefully maintained exactly for clang-tidy. I don't think that if they are also added for another tool, the check names or even diagnostics lines will match. Thus, for these cases the check just shouldn't be enabled in configuration. The same thing is for headers. I suppose, that typically the list of checks is the same for the whole project, thus diagnostics for headers won't be an issue, but if it is not the case, the nolint-usage check could just be disabled for some modules. Anyway, I don't see the way to determine if a header contains the diagnostics been included from the other source files.

If it is necessary, some configuration options could be added for this check (e.g. an option to check only NOLINT comments in source files, but no headers; or to check only NOLINT comments with/without explicit list of checks specification).

[1] - https://github.com/google/styleguide/blob/9663cabfeeea8f1307b1acde59471f74953b8fa9/cpplint/cpplint.py#L577

xgsa updated this revision to Diff 128431.Jan 2 2018, 10:17 AM

Rename the check nolint-usage => readability-nolint-usage for consistency.
Update diagnostics message according to the review comments.

aaron.ballman added inline comments.Jan 5 2018, 11:39 AM
clang-tidy/ClangTidyDiagnosticConsumer.cpp
118

Why "readability"? That module is generally for code readability issues like redundant code or naming conventions. I think this should probably be in "misc" or perhaps "bugprone".

829–830

In spite of the fact, that nolint-usage is technically not a check

That's part of what concerns me about this in terms of discoverability and usability -- it's not a real check but it sort of behaves like one. Does this get listed with -list-checks? If the user passes -nolint-* on the command line does it still get disabled?

Also, all of this is predicated by the fact that these nolint warnings are going to trigger in such a way that sometimes needs to be silenced. What I think isn't very intuitive is that the recommended way to silence a nolint-comment-related diagnostic is to use another nolint comment to disable reporting bugs with nolint comments. However, documentation can help with that, so perhaps it's not bad.

xgsa added inline comments.Jan 6 2018, 3:48 AM
clang-tidy/ClangTidyDiagnosticConsumer.cpp
118

I consider this check as a detector of redundant comments (i.e. redundant code). Moreover, it seems, these comments influence only on code readability and don't bring bugs, so "bugprone" doesn't look good for me. One more minor argument for "readability" is that it is in such category in cpplint, so it will be more natural for its users. Summing up, I am for leaving the check in "readability", but I don't mind also putting it in "misc".

829–830

Does this get listed with -list-checks?

Not yet, I missed that, but according to my understanding, the "nolint-usage" should be in the list. I'll fix it.

If the user passes -nolint-* on the command line does it still get disabled?

It should work (there is the isCheckEnabled call in ClangTidyContext::setPreprocessor, so if no comments are collected, no checks will be done and no diagnostics reported).

What I think isn't very intuitive is that the recommended way to silence a nolint-comment-related diagnostic is to use another nolint comment to disable reporting bugs with nolint comments.

Could you please suggest a more intuitive way as an example?

However, documentation can help with that, so perhaps it's not bad.

So is documentation the only concern about this patch or don't you like the overall idea?

xgsa updated this revision to Diff 128893.Jan 7 2018, 11:24 PM

Fixed showing the check in -list-checks.

alexfh added a comment.Jan 9 2018, 8:27 AM

A high-level comment: NOLINT category parsing and warning on incorrect NOLINT categories makes it more difficult for code to comply both with clang-tidy and cpplint (https://github.com/cpplint/cpplint), since:

  1. the category names in these tools are different,
  2. cpplint currently doesn't support more than one category with the // NOLINT(category) syntax,
  3. cpplint complains on // NOLINT directives with an unknown category.

So when suppressing a warning that is present in both clang-tidy and cpplint (e.g. google-runtime-int) one has to use // NOLINT without categories, which kind of undermines the effort of allowing category names explicitly specified in suppression directives.

There are multiple solutions possible. One would be to add a clang-tidy-specific suppression directive and only support specifying check names in it (// NOCLANGTIDY seems verbose, but I don't have a better name. Suggestions are welcome.). Leaving support for // NOLINT without category names would be nice for suppressing warnings common with cpplint.

Another option would be changing cpplint to somehow recognize and ignore clang-tidy check names in the suppressed categories list.

clang-tidy/ClangTidy.cpp
402

clang-format, please

alexfh requested changes to this revision.Mar 15 2018, 1:55 AM
This revision now requires changes to proceed.Mar 15 2018, 1:55 AM