In case a checker is registered multiple times as an alias, the emitted warnings are uniqued by the report message. However, it is random which checker name is included in the warning. When processing the output of clang-tidy this behavior caused some problems. In this commit the uniquing key contains the checker name too.
Details
Diff Detail
- Repository
- rL LLVM
Event Timeline
LessClangTidyError only compares location and message, but it could also compare other things like notes, fixes, etc. For the problem outlined in the description of this patch we can probably include the checker name into the key. WDYT?
Yes, LessClangTidyError would really be the best place for this. The reason of my implementation was that I wanted to bind this feature to a command line option. I don't know if there was any design decision behind the current uniquing logic and I didn't want to break it if there was any. So would you be agree with having duplicate reports by default without any command line flag in case of alias checkers? I believe this would be the most painless solution, since in case of further improvements about including notes, fixes, etc. in the functor, the potential command line option controlling this would be unnecessarily complex. I don't know if there could be any use-case on those.
Thanks for the review!
I think it will be a strict improvement to include the check name into the deduplication key (probably after the file and offset and before the message). I don't see any reason to hide this behind a flag or otherwise retain the old behavior. As for expanding the key to include notes and fixes - it's probably good to do this either, and this may help uncover incorrect behavior of some checks. I'd suggest to start with the check name though.
So the problem you're trying to solve is that the output is non-deterministic, is that correct?
However, it is random which checker name is included in the warning.
If that is indeed the problem, then I think the solution should be making the output deterministic -- not printing the message multiple times...
Not exactly. The problem is that it is non-deterministic which checker reports the issue. For example before this patch, in the test file above there was only one report. However, sometimes the report message is:
throw expression should throw anonymous temporary values instead [cert-err09-cpp]
and sometimes the message is:
throw expression should throw anonymous temporary values instead [cert-err61-cpp]
(note the content of the square bracket)
So after this patch both of these lines will be emitted.
Producing the message two times is worse user experience than producing one. Most users don't care which checker produced the message. However, the output should be deterministic. Therefore, a better fix would be making deduplication deterministic, instead of printing the message twice.
This suggestion would result another strange behavior: if the user disables cert-err09-cpp because he or she doesn't want to see its reports, the other one (cert-err61-cpp) will still report the issue. So he or she has to disable both (or as many aliases it has).
As far as I know the idea behind aliases is that a checker can be registered with different options on different names. For example a code style checker can be registered with different names (e.g. GCC-style, LLVM-style, etc.) with different checker options on code styles. This could be another advantage of this patch, since it would be annoying to see a message in which a GCC-style checker warns about an LLVM-style violation.
This suggestion would result another strange behavior: if the user disables cert-err09-cpp because he or she doesn't want to see its reports, the other one (cert-err61-cpp) will still report the issue. So he or she has to disable both (or as many aliases it has).
That seems to be the case regardless of the implementation strategy in this patch.
This could be another advantage of this patch, since it would be annoying to see a message in which a GCC-style checker warns about an LLVM-style violation.
I don't understand the scenario. I think people wouldn't enable both checkers in the first place.
They can silently get enabled if you follow the -Weverything approach - enable all checks, and selectively disable.
Current clang-tidy check aliasing really looks misdesigned..
-Weverything is not recommended for anything except compiler testing, and similarly with clang-tidy, enabling all checkers is not a good idea. I don't think improving this particular point will make enabling all checks more usable.
That is beyond the point here (and is very arguable).
Even just enabling all cert-* checks will result in this behavior.
The difference is that without the patch the user will initially see only one of the warnings, but after disabling the corresponding check the other warning will start appearing. Thus, the behavior with more granular deduplication results in a more consistent and less surprising experience.
As noted earlier, warning deduplication can hide different issues in checks that would usually better be fixed in the check itself. For example, the google-explicit-constructor check used in the test for deduplication (https://reviews.llvm.org/D2989) has been fixed since then as were a number of other checks.
clang-tools-extra/test/clang-tidy/duplicate-reports.cpp | ||
---|---|---|
6–7 ↗ | (On Diff #213356) | I believe, this is not true anymore. Including the check name into the comparison key should make the order deterministic. |
Since now the order of diagnostic messages is deterministic, we can count on this order in the test file.
I have no authority to accept patches in clang-tidy (though please feel free to add me as a reviewer, I can more easily participate in the discussion!), but in any case, this looks good to me too, thanks!
Thank you for the valuable comments and the review!
I'm just planning to register to the bug tracker, because it is getting more relevant due to some other contributions as well.