This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][NFC] Fix inconsistent references to checkers as "checks"
ClosedPublic

Authored by Szelethus on Sep 3 2019, 7:35 PM.

Details

Summary

Traditionally, clang-tidy uses the term check, and the analyzer uses checker, but in the very early years, this wasn't the case, and code originating from the early 2010's still incorrectly refer to checkers as checks.

This patch attempts to hunt down most of these, aiming to refer to checkers as checkers, but preserve references to callback functions (like checkPreCall) as checks.

One thing that is still off in this context (haha) however is CheckerContext, which is also, in this sense, an incorrect name: it isn't a context of a checker, but rather a single callback. Shall I change that to CheckContext?

Diff Detail

Event Timeline

Szelethus created this revision.Sep 3 2019, 7:35 PM
Szelethus added subscribers: gribozavr, alexfh.

@gribozavr, @alexfh this isn't a particularly exciting patch, but it does highlight a fundamental difference in between the analyzer and clang-tidy, so I added you for the sake of it, since we're talking about a closer interaction anyway :)

NoQ accepted this revision.Sep 3 2019, 10:50 PM

Yay thanks!!

In my understanding "check" is a feature of the tool, while "checker" is an entity within the tool that implements that feature. In most cases there's no difference, though, so yeah, it's a tradition thing. I generally enjoy being able to say things like "Malloc checker (singular!) provides checks (plural!) for use-after-free bugs and memory leaks". Also i believe that "check" in checkPreCall is a verb, so yeah, it should be kept.

a fundamental difference in between the analyzer and clang-tidy

Honestly, i'm much more worried about message capitalization :)

This revision is now accepted and ready to land.Sep 3 2019, 10:50 PM
gribozavr accepted this revision.Sep 3 2019, 11:38 PM

Thanks! Yay consistency.

I prefer the term "checker" to refer to individual modules because I feel it is more precise and less ambiguous. In phrases like "malloc check", "make_unique check", it is unclear what does the check -- malloc itself, the caller of malloc, or something else. Therefore, I would be supportive of repainting ClangTidy to also use "checker", but I would want to know what @alexfh thinks about it before we do it in ClangTidy.

Thanks! Yay consistency.

I prefer the term "checker" to refer to individual modules because I feel it is more precise and less ambiguous. In phrases like "malloc check", "make_unique check", it is unclear what does the check -- malloc itself, the caller of malloc, or something else. Therefore, I would be supportive of repainting ClangTidy to also use "checker", but I would want to know what @alexfh thinks about it before we do it in ClangTidy.

Historically, clang-tidy only used the term "check" (to denote the thing that checks something, rather than the rule being checked or the act of checking), and we tried to keep its use consistent. However, "checker" is a more precise and less ambiguous way to convey this meaning. I support to use the term "checker" in clang-tidy, as long as someone is willing to update the code and documentation (except for verbs, e.g. the check() method ;). Also note that there's a non-trivial number of out-of-tree check(er)s out there. They will need to be updated as well.

Adding Aaron in case he has a different opinion.

Szelethus edited the summary of this revision. (Show Details)Sep 4 2019, 9:56 AM
In D67140#1656831, @NoQ wrote:

Honestly, i'm much more worried about message capitalization :)

Likewise. I wish the static analyzer would follow the usual conventions followed by clang and clang-tidy. ;-)

Historically, clang-tidy only used the term "check" (to denote the thing that checks something, rather than the rule being checked or the act of checking), and we tried to keep its use consistent. However, "checker" is a more precise and less ambiguous way to convey this meaning. I support to use the term "checker" in clang-tidy, as long as someone is willing to update the code and documentation (except for verbs, e.g. the check() method ;). Also note that there's a non-trivial number of out-of-tree check(er)s out there. They will need to be updated as well.

Adding Aaron in case he has a different opinion.

My primary concern is with needless churn for out-of-tree clients. They don't get any real added benefit from the change in nomenclature, but renaming ClangTidyCheck to ClangTidyChecker will break every single out of tree clang-tidy checker. I've not used the plugin infrastructure for clang-tidy, but will this cause plugins to fail to load? If so, is it a silent failure or a noisy one?

In D67140#1656831, @NoQ wrote:

Honestly, i'm much more worried about message capitalization :)

Likewise. I wish the static analyzer would follow the usual conventions followed by clang and clang-tidy. ;-)

I have the opposite opinion -- I wish that ClangTidy used complete sentences, and multiple sentences if it makes sense. The sentence fragments are too brief to explain complex and nuanced topics that ClangTidy communicates about. ClangTidy often plays the role of a developer education tool. It is not a guard like a compiler; developers can totally ignore ClangTidy if they disagree with the message. The better we can explain the problem, the more likely it is the developer will act on the message. I believe static analysis tools would be better off if we could write multiple sentences in the diagnostic.

Even for compiler messages, a sentence fragment is sometimes too concise.

Historically, clang-tidy only used the term "check" (to denote the thing that checks something, rather than the rule being checked or the act of checking), and we tried to keep its use consistent. However, "checker" is a more precise and less ambiguous way to convey this meaning. I support to use the term "checker" in clang-tidy, as long as someone is willing to update the code and documentation (except for verbs, e.g. the check() method ;). Also note that there's a non-trivial number of out-of-tree check(er)s out there. They will need to be updated as well.

Adding Aaron in case he has a different opinion.

My primary concern is with needless churn for out-of-tree clients. They don't get any real added benefit from the change in nomenclature, but renaming ClangTidyCheck to ClangTidyChecker will break every single out of tree clang-tidy checker. I've not used the plugin infrastructure for clang-tidy, but will this cause plugins to fail to load? If so, is it a silent failure or a noisy one?

A lot of the changes we do in Clang break many clients. I think we should try to find the best API.

We could add a deprecated typedef for, say, one release, if you think that would be significantly helpful for out-of-tree users.

The pre-compiled plugins will fail to load, but they will fail to load regardless of whether we make this change or not. The plugins will have to be recompiled anyway (and they will fail to compile), since Clang does not provide a stable library ABI.

In D67140#1656831, @NoQ wrote:

Honestly, i'm much more worried about message capitalization :)

Likewise. I wish the static analyzer would follow the usual conventions followed by clang and clang-tidy. ;-)

I have the opposite opinion -- I wish that ClangTidy used complete sentences, and multiple sentences if it makes sense. The sentence fragments are too brief to explain complex and nuanced topics that ClangTidy communicates about. ClangTidy often plays the role of a developer education tool. It is not a guard like a compiler; developers can totally ignore ClangTidy if they disagree with the message. The better we can explain the problem, the more likely it is the developer will act on the message. I believe static analysis tools would be better off if we could write multiple sentences in the diagnostic.

Even for compiler messages, a sentence fragment is sometimes too concise.

I agree with you in principle, but practicality still matters. I don't imagine we're going to go back and change the thousands of diagnostics in Clang to be complete sentences, and I prefer my diagnostics to be consistent. It's jarring when one part of the compiler uses one style of diagnostics and another part of the compiler uses a different style. So while I'd love it if we had more descriptive diagnostics, I would be happy to settle for consistent styles of diagnostics.

Historically, clang-tidy only used the term "check" (to denote the thing that checks something, rather than the rule being checked or the act of checking), and we tried to keep its use consistent. However, "checker" is a more precise and less ambiguous way to convey this meaning. I support to use the term "checker" in clang-tidy, as long as someone is willing to update the code and documentation (except for verbs, e.g. the check() method ;). Also note that there's a non-trivial number of out-of-tree check(er)s out there. They will need to be updated as well.

Adding Aaron in case he has a different opinion.

My primary concern is with needless churn for out-of-tree clients. They don't get any real added benefit from the change in nomenclature, but renaming ClangTidyCheck to ClangTidyChecker will break every single out of tree clang-tidy checker. I've not used the plugin infrastructure for clang-tidy, but will this cause plugins to fail to load? If so, is it a silent failure or a noisy one?

A lot of the changes we do in Clang break many clients. I think we should try to find the best API.

Yes, but we also tend to push back on causing needless churn. Then again, with the recent resurfacing of discussions about renaming everything under the sun, maybe we've changed our community opinion here. :-D I guess I don't see Check vs Checker to be worthy of breaking everyone's out-of-tree code over.

We could add a deprecated typedef for, say, one release, if you think that would be significantly helpful for out-of-tree users.

I think that's a pretty reasonable way forward. We should probably also put it in the release notes as well.

The pre-compiled plugins will fail to load, but they will fail to load regardless of whether we make this change or not. The plugins will have to be recompiled anyway (and they will fail to compile), since Clang does not provide a stable library ABI.

Ah, good to know! That reduces my concern, but doesn't negate it. AFAIK, we haven't changed the interface such that it requires code changes rather than just a recompile in recent history, so this is a bit novel.

My personal feeling is: by itself, this isn't worth the churn but the fix to downstream code is pretty easy so I'm not strongly opposed. However, do we have other "I wish we changed this interface, but that would break the world" issues we want to tackle in this release for clang-tidy? That might make this refactoring more worth the pain.

Ah, good to know! That reduces my concern, but doesn't negate it. AFAIK, we haven't changed the interface such that it requires code changes rather than just a recompile in recent history, so this is a bit novel.

I think API changes happen all the time. At Google, we are integrating upstream LLVM and Clang changes into our internal codebase daily. We have a lot of internal ClangTidy checkers. Fixing up all our internal code to keep with upstream changes is a full time job for one engineer (but it is a rotation).

My personal feeling is: by itself, this isn't worth the churn but the fix to downstream code is pretty easy so I'm not strongly opposed. However, do we have other "I wish we changed this interface, but that would break the world" issues we want to tackle in this release for clang-tidy? That might make this refactoring more worth the pain.

I'll think about more stuff to fix.

Ah, good to know! That reduces my concern, but doesn't negate it. AFAIK, we haven't changed the interface such that it requires code changes rather than just a recompile in recent history, so this is a bit novel.

I think API changes happen all the time. At Google, we are integrating upstream LLVM and Clang changes into our internal codebase daily. We have a lot of internal ClangTidy checkers. Fixing up all our internal code to keep with upstream changes is a full time job for one engineer (but it is a rotation).

I think this sort of backs up the point I was making. It requires an FTE to keep up with the breaks already. I'm worried about the folks who don't have the same resources that Google has. Where you get a new version of Clang every N months (rather than tracking ToT) and it's a scramble to make everything work again with the newest version, which delays adopting the newest Clang version until one lucky developer fixes all the checks.

API changes happen all the time, but rarely do they break *everything* for so little gain. I'm not asking for API stability guarantees, but we should still recognize that not everyone has Google's resources for keeping things working.

Then again, with the recent resurfacing of discussions about renaming everything under the sun, maybe we've changed our community opinion here. :-D I guess I don't see Check vs Checker to be worthy of breaking everyone's out-of-tree code over.

This pretty much summarizes my feelings on this: Changing this in ClangTidy would be better, but probably wouldn't be worth it.

In D67140#1656831, @NoQ wrote:

Honestly, i'm much more worried about message capitalization :)

Likewise. I wish the static analyzer would follow the usual conventions followed by clang and clang-tidy. ;-)

I have the opposite opinion -- I wish that ClangTidy used complete sentences, and multiple sentences if it makes sense. The sentence fragments are too brief to explain complex and nuanced topics that ClangTidy communicates about. ClangTidy often plays the role of a developer education tool. It is not a guard like a compiler; developers can totally ignore ClangTidy if they disagree with the message. The better we can explain the problem, the more likely it is the developer will act on the message. I believe static analysis tools would be better off if we could write multiple sentences in the diagnostic.

Even for compiler messages, a sentence fragment is sometimes too concise.

I agree with you in principle, but practicality still matters. I don't imagine we're going to go back and change the thousands of diagnostics in Clang to be complete sentences, and I prefer my diagnostics to be consistent. It's jarring when one part of the compiler uses one style of diagnostics and another part of the compiler uses a different style. So while I'd love it if we had more descriptive diagnostics, I would be happy to settle for consistent styles of diagnostics.

I personally disagree with this point. I also think that ClangTidy and the Static Analyzer play a drastically different role compared to regular compiler diagnostics, and we should regard them as such. That said, I don't integrate the Static Analyzer into my editor, and use a different tool to view its results.

Ah, good to know! That reduces my concern, but doesn't negate it. AFAIK, we haven't changed the interface such that it requires code changes rather than just a recompile in recent history, so this is a bit novel.

I think API changes happen all the time. At Google, we are integrating upstream LLVM and Clang changes into our internal codebase daily. We have a lot of internal ClangTidy checkers. Fixing up all our internal code to keep with upstream changes is a full time job for one engineer (but it is a rotation).

I think this sort of backs up the point I was making.

Sorry, but in the previous comment you were saying that "we haven't changed the interface such that it requires code changes". So which is it, for you?

It requires an FTE to keep up with the breaks already.

Exactly. So one more or one less API change that requires a trivial code change *does not matter*. People working on the Clang upgrade still have a ton of other, complex, changes to deal with. Is it more work to upgrade for this sort of API rename? Yes. Is meaningfully more work, if you look at everything that needs to be done as a whole? No. It will not be a thing that makes or breaks someone's Clang upgrade.

In D67140#1656831, @NoQ wrote:

Honestly, i'm much more worried about message capitalization :)

Likewise. I wish the static analyzer would follow the usual conventions followed by clang and clang-tidy. ;-)

I have the opposite opinion -- I wish that ClangTidy used complete sentences, and multiple sentences if it makes sense. The sentence fragments are too brief to explain complex and nuanced topics that ClangTidy communicates about. ClangTidy often plays the role of a developer education tool. It is not a guard like a compiler; developers can totally ignore ClangTidy if they disagree with the message. The better we can explain the problem, the more likely it is the developer will act on the message. I believe static analysis tools would be better off if we could write multiple sentences in the diagnostic.

Even for compiler messages, a sentence fragment is sometimes too concise.

I agree with you in principle, but practicality still matters. I don't imagine we're going to go back and change the thousands of diagnostics in Clang to be complete sentences, and I prefer my diagnostics to be consistent. It's jarring when one part of the compiler uses one style of diagnostics and another part of the compiler uses a different style. So while I'd love it if we had more descriptive diagnostics, I would be happy to settle for consistent styles of diagnostics.

Why do static analysis tools have to be consistent with Clang in its message style?

Ah, good to know! That reduces my concern, but doesn't negate it. AFAIK, we haven't changed the interface such that it requires code changes rather than just a recompile in recent history, so this is a bit novel.

I think API changes happen all the time. At Google, we are integrating upstream LLVM and Clang changes into our internal codebase daily. We have a lot of internal ClangTidy checkers. Fixing up all our internal code to keep with upstream changes is a full time job for one engineer (but it is a rotation).

I think this sort of backs up the point I was making.

Sorry, but in the previous comment you were saying that "we haven't changed the interface such that it requires code changes". So which is it, for you?

Both. :-) It's been my experience that there are relatively few breaking changes to clang-tidy checks (but they certainly do happen). Your experience is that you need an entire FTE to deal with the amount of breaks. Both of our experiences can be valid simultaneously.

It requires an FTE to keep up with the breaks already.

Exactly. So one more or one less API change that requires a trivial code change *does not matter*. People working on the Clang upgrade still have a ton of other, complex, changes to deal with. Is it more work to upgrade for this sort of API rename? Yes. Is meaningfully more work, if you look at everything that needs to be done as a whole? No. It will not be a thing that makes or breaks someone's Clang upgrade.

That's one perspective on it, yes. As I said, my experience is different than yours, which is probably why I'm more hesitant to make sweeping breaking changes like this when they have so little tangible benefit. You claim it won't make or break someone's Clang upgrade, but it certainly *can*. I've worked for places that, when we didn't have the resources to deal with breakage from a compiler upgrade, we delayed upgrading to that compiler until after we handled the issues raised, and I don't think this was all that unusual of a practice.

I'm not saying *no* to this change, despite my push-back. We don't have source compatibility or API stability guarantees, so it would be unreasonable to think we'll never make a breaking change. I just don't think it's a good software engineering practice to be cavalier about breaking 100% of downstream users without a very good reason to do so. My opinion is that check vs checker is not a very good reason to break the world, but I'm not willing to hold up the patch over it if that opinion isn't shared by the community.

Why do static analysis tools have to be consistent with Clang in its message style?

I don't think it's a requirement (so long as the diagnostics are clear about the issue being diagnosed, I'm happy enough), but I think it's good for a tool to be self-consistent in its messaging. It's jarring that one part of the compiler emits diagnostics one way and another emits them a totally different way. It may be less of an impact for people who don't see the output from both at the same time, but that happens to be the use case I have. As a somewhat related example of inconsistencies, the Clang static analyzer does not report compiler diagnostics through it's bug reporter interface, which means that exporting analysis results to SARIF only exports the static analyzer bugs. This has caught quite a few people by surprise at GrammaTech and they've asked when Clang can start reporting all of the diagnostics instead of just some of them (aka, they thought this behavior was a bug), but then the inconsistencies in messaging would be more obvious.

NoQ added a comment.Sep 5 2019, 2:33 PM

I don't think it's a requirement (so long as the diagnostics are clear about the issue being diagnosed, I'm happy enough), but I think it's good for a tool to be self-consistent in its messaging. It's jarring that one part of the compiler emits diagnostics one way and another emits them a totally different way. It may be less of an impact for people who don't see the output from both at the same time, but that happens to be the use case I have.

Unfortunately i think at this point many clients who tried to integrate multiple tools resorted to "Automatic Message Recapitalization" (c). For that reason we probably can harmlessly decapitalize Static Analyzer messages. I suspect that it won't really change anything either, because most tools will still be afraid of accidental inconsistencies in the compiler.

In D67140#1659831, @NoQ wrote:

I don't think it's a requirement (so long as the diagnostics are clear about the issue being diagnosed, I'm happy enough), but I think it's good for a tool to be self-consistent in its messaging. It's jarring that one part of the compiler emits diagnostics one way and another emits them a totally different way. It may be less of an impact for people who don't see the output from both at the same time, but that happens to be the use case I have.

Unfortunately i think at this point many clients who tried to integrate multiple tools resorted to "Automatic Message Recapitalization" (c). For that reason we probably can harmlessly decapitalize Static Analyzer messages. I suspect that it won't really change anything either, because most tools will still be afraid of accidental inconsistencies in the compiler.

If we're okay with the inconsistency but still feel like giving ourselves more work, adding proper punctuation to Static Analyzer diagnostics would at least make them grammatically correct. :-D

I don't have strong feelings on capitalization itself either, but the idea of longer, multi-sentence bug report messages (maybe this could finally be used? D66572#inline-602143) sound pretty cool.

NoQ added a comment.Sep 5 2019, 3:14 PM

If we're okay with the inconsistency but still feel like giving ourselves more work, adding proper punctuation to Static Analyzer diagnostics would at least make them grammatically correct. :-D

I'd love us some punctuation! Unfortunately the last time a person who actually speaks English committed actively to the Static Analyzer was, like, a couple of years ago at least (:

In D67140#1659907, @NoQ wrote:

If we're okay with the inconsistency but still feel like giving ourselves more work, adding proper punctuation to Static Analyzer diagnostics would at least make them grammatically correct. :-D

I'd love us some punctuation! Unfortunately the last time a person who actually speaks English committed actively to the Static Analyzer was, like, a couple of years ago at least (:

I guess there's a joke waiting to be made about a couple Hungarians and Russians discussing how to write proper English :^) With that said, we could ask @whisperity to chip in more, he has an ever-growing academic background, and a keen eye for grammatical errors.

We should take a page from desktop software here. If the messages were in a separate file, there would be a lot of people capable of mass-editing them. When messages are hardcoded in the tool code, navigating and editing them requires more skill, and definitely a lot more jumping around.

In D67140#1659907, @NoQ wrote:

If we're okay with the inconsistency but still feel like giving ourselves more work, adding proper punctuation to Static Analyzer diagnostics would at least make them grammatically correct. :-D

I'd love us some punctuation! Unfortunately the last time a person who actually speaks English committed actively to the Static Analyzer was, like, a couple of years ago at least (:

I guess there's a joke waiting to be made about a couple Hungarians and Russians discussing how to write proper English :^)

LOL!

With that said, we could ask @whisperity to chip in more, he has an ever-growing academic background, and a keen eye for grammatical errors.

I am also happy to help out with this task, though it would have to be on my own time as a best-faith effort.

We should take a page from desktop software here. If the messages were in a separate file, there would be a lot of people capable of mass-editing them. When messages are hardcoded in the tool code, navigating and editing them requires more skill, and definitely a lot more jumping around.

Agreed -- this is why Clang puts its diagnostics into .td files (it's not perfect because people still sometimes use positional args when %select{} would be easier to localize, but it's a great start). I think the difficulty the static analyzer and clang-tidy would have doing this is in figuring out how to organize the diagnostics without breaking checker layering.

NoQ added a comment.Sep 9 2019, 5:21 PM

We should take a page from desktop software here. If the messages were in a separate file, there would be a lot of people capable of mass-editing them. When messages are hardcoded in the tool code, navigating and editing them requires more skill, and definitely a lot more jumping around.

In the Static Analyzer there's often an explosive amount of dynamically generated messages that are going to be pretty hard to stuff into a tablegen pattern. Say, you can probably turn this into "%0 %1 %2 with a %3 retain count into an out parameter %4%5" but would it really help?

In D67140#1664106, @NoQ wrote:

We should take a page from desktop software here. If the messages were in a separate file, there would be a lot of people capable of mass-editing them. When messages are hardcoded in the tool code, navigating and editing them requires more skill, and definitely a lot more jumping around.

In the Static Analyzer there's often an explosive amount of dynamically generated messages that are going to be pretty hard to stuff into a tablegen pattern. Say, you can probably turn this into "%0 %1 %2 with a %3 retain count into an out parameter %4%5" but would it really help?

Unfortunately, that message is already not following best practices. One should not be passing snippets in natural language into substitutions. Every character that is a natural language construct should be in the message string. The only allowed substitutions are types, names, numbers and such. We already support %select in message strings, but there are frameworks that are more flexible -- we can port features from them into our message strings as needed.

In D67140#1664106, @NoQ wrote:

We should take a page from desktop software here. If the messages were in a separate file, there would be a lot of people capable of mass-editing them. When messages are hardcoded in the tool code, navigating and editing them requires more skill, and definitely a lot more jumping around.

In the Static Analyzer there's often an explosive amount of dynamically generated messages that are going to be pretty hard to stuff into a tablegen pattern. Say, you can probably turn this into "%0 %1 %2 with a %3 retain count into an out parameter %4%5" but would it really help?

Unfortunately, that message is already not following best practices. One should not be passing snippets in natural language into substitutions. Every character that is a natural language construct should be in the message string. The only allowed substitutions are types, names, numbers and such. We already support %select in message strings, but there are frameworks that are more flexible -- we can port features from them into our message strings as needed.

I think this is a great goal. Clang doesn't always use %select{} to best effect (sometimes we still pass in hard-coded strings), but that's the exception rather than the rule.

NoQ added a comment.Sep 10 2019, 10:22 AM

%select

%select{}

Whoa, this could actually work then. I like it.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptSep 12 2019, 12:11 PM

Yes, thank you! I've been keeping my mailbox open and commiting slowly, it seems like the buildbots have a wrong email address set up for me.