Page MenuHomePhabricator

clang-tidy doc: Add the severities description
AbandonedPublic

Authored by sylvestre.ledru on Dec 28 2019, 9:33 AM.

Details

Summary

I took the text from codechecker. Daniel Krupp ( @dkrupp ) wrote it.

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptDec 28 2019, 9:33 AM

I think this is a reasonable first-pass at the severity descriptions, but do you think we should add some words that tell the user to use whatever severity is picked by a coding standard if one is being followed? For instance, the CERT rules all come with a severity specified by the rule itself. I don't see a good reason for us to pick a different severity than what is listed by the rule author, except if each coding standard has drastically different ideas about severity (which I don't think is the case from what I can tell).

It's very hard to write these kinds of definitions without ambiguity and plenty of subjective interpretation creeping in. I'll try my best to provide constructive feedback but I'm admittedly struggling a bit with providing helpful counter proposals.
Ideally these levels would be based on a data driven approach, like when we say that something is error prone we should be able to support that with data.

I think the high severity is well defined it's distinctly different from the other levels from a qualitative perspective.
I would also like to suggest the addition that any check reporting security vulnerabilities should be classified as high severity.
One way to make all levels qualitatively different would be to use something like the following definitions:

  • Medium: things that aren't direct a error but are error prone somehow (ideally there would be data to support the claim that this issue often cause errors)
  • Low: things that are not direct errors neither, prone to indirectly cause errors, but which still cause quality issues like unnecessarily low performance.
  • Style: things that are handled by clang-format rather than clang-tidy.

Given how hard it is to write these kinds of definitions clearly I'm not sure it is a good idea to introduce severity for clang tidy checks.
Unless we can find a very strong definitions for distinctly different levels, supported with actual data, it might be better to just not do it.
Also I think there is already a difference in severity level indicated by whether the check is part of clang-format, clang-tidy or clang-diagnostics
and by defining these severity levels we need to make sure they don't conflict in any way with these already implied severity levels.

In short: My first suggestion is to just remove severity from the table instead of trying to improve the definitions.
If there is a strong preference towards keeping them I suggest making them more qualitatively different as described above.

clang-tools-extra/docs/clang-tidy/checks/list.rst
414

It seems like there is a bit of overlap between STYLE and LOW. They both mention readability.
Might I suggest that style could be only issues related to naming convention and text formatting, like indentation, line breaks -like things that could be clang format rules.

418

To me it's not obvious why these examples aren't medium severity. They seem to fit the description of medium severity, they are also very similar to the example in medium severity.

423

Does something need to always result in division by zero to be of high severity or is it enough that it introduces the possibility for division by zero to occur?

I do agree that they are subjective and not perfect.

However, I found the classification extremely useful when you look at the results on big projects.
I have been using codechecker (where the severities are coming from) for Firefox and its has been extremely useful to evaluate the importance of the checkers.

For instance, the CERT rules all come with a severity specified by the rule itself

Did you see some difference?

it if each coding standard has drastically different ideas about severity

Do you have some examples of this occurrence?

thanks for the feedback

I do agree that they are subjective and not perfect.

However, I found the classification extremely useful when you look at the results on big projects.
I have been using codechecker (where the severities are coming from) for Firefox and its has been extremely useful to evaluate the importance of the checkers.

IMO, that usefulness comes from consistency when picking a severity. I share the concern that these are pretty subjective descriptions currently. For instance, the guidance you give in this patch is somewhat different than the guidance picked by CERT (https://wiki.sei.cmu.edu/confluence/display/c/How+this+Coding+Standard+is+Organized#HowthisCodingStandardisOrganized-RiskAssessment) and this will lead to discrepancies if it hasn't already.

For instance, the CERT rules all come with a severity specified by the rule itself

Did you see some difference?

I've not looked for them specifically yet (tbh, this severity thing caught me off guard, I didn't see the reviews for adding it), but my concern comes from the fact that the process of picking severity already differs between what's written and one of the coding standards we have checks for.

it if each coding standard has drastically different ideas about severity

Do you have some examples of this occurrence?

Not off the top of my head. I think it would be useful for someone to look at the coding standards we currently have clang-tidy checks for to see if those standards specify a severity for their rules. From there, we can see what commonalities there are between the coding standards and see if we can come up with a heuristic for picking a severity that roughly matches. Or maybe we should only specify a severity when one is picked by a coding standard and not attempt to determine our own.

  • Style: things that are handled by clang-format rather than clang-tidy.

This is not true, for two reasons. The shorter answer: In case it was true, the "severity category" STYLE would be a moot point in CodeChecker, as CodeChecker (currently) catalogues and displays CSA and CT diagnostics.

Moreover, there are plenty of stylistic constructs (readability-, some modernise-, etc.) which are not handled, and maybe can not be handled by clang-format on a "token sequence" level.
For example, using record-member-init-exprs instead of initializers in the constructor is not something clang-format could reasonably fix as per my understanding. It is also not something it should fix, whereas a Tidy rule being enabled (or not...) for a project is the project's decision about what stylistics or nomenclature (hah, maybe this check, in particular, should be moved from low to style @sylvestre.ledru 😈) they want to embrace.

A low diagnostics (and everything "above", assuming a(n at least) partial ordering on severities) should mean the coding construct is problematic: there is a chance -- perhaps once in a lifetime, perhaps not -- that doing this will cause a real error. A style thing should mean "Hey, whatever you have written, is pretty much A-Ok! and works, congrats for writing valid (Obj)?C(++)?, but please realise that we are writing DERP/C++-42 and not C89 and please move your loop variable inside the loop's statement". No real "game-breaking" issue should ever arise from deciding on fixing or ignoring a style check's output.


One thing which I forgot to mention (because 99.9999999% of the people don't do it, so it's understandable that it's been missed) is that CodeChecker severities can be fine-tuned as it's literally just a JSON file. I haven't touched actual CodeChecker code for a while now, but I think in in case you rewrite this JSON as you see fit before running an analysis or starting a server, you can go as arbitrary as you want.

@whisperity I think you misunderstood my comment. I was not trying to give a more correct description of the current definition of style-level severity in CodeChecker. I was trying to propose new definitions of the different severity levels that this patch propose for clang-tidy.
So the claim that my suggestion is not "true" is invalid since that isn't a matter of true or false, but merely a matter of how one defines the style level.

My reason for proposing these new definitions is that I think my proposals highlight how I think that the levels could be more distinctly different in their definition instead. I also think these severity levels need more distinct definitions to be more useful than they are problematic. I'm afraid that definitions which people may interpret differently are at significant risk of causing more confusion than clarification.
If severity levels must be exactly like they are currently defined in CodeChecker then IMO that is a rather strong reason not to introduce them in clang-tidy and just keep that stuff in CodeChecker.

A low diagnostics (and everything "above", assuming a(n at least) partial ordering on severities) should mean the coding construct is problematic: there is a chance -- perhaps once in a lifetime, perhaps not -- that doing this will cause a real error. A style thing should mean "Hey, whatever you have written, is pretty much A-Ok! and works, congrats for writing valid (Obj)?C(++)?, but please realise that we are writing DERP/C++-42 and not C89 and please move your loop variable inside the loop's statement". No real "game-breaking" issue should ever arise from deciding on fixing or ignoring a style check's output.

If we are to only distinguish severity level based on the probability that violating the rule will cause an error then we should support every choice of severity level with data, something which may be quite hard, for most contributors writing a new check, to actually achieve.

@vingeldal Apologies, in that case. However, I would still claim that style (as a potential severity) has its purpose for Tidy checkers, not just for clang-format.

If severity levels must be exactly like they are currently defined in CodeChecker then IMO that is a rather strong reason not to introduce them in clang-tidy and just keep that stuff in CodeChecker.

This is exactly the reason why I urged @sylvestre.ledru in D36051 to mention that the levels were taken from CodeChecker's classification. Which, by all means, could be viewed as "just another" classification. @dkrupp will hopefully explain the decision making behind the classification when he is back from holidays - give it a week more or so.

I believe the severity categories and assigned severity levels in CodeChecker are also malleable. Discussion brings us forward if there is a chance of coming up with better ideas, it's just a little bit of JS and database wrangling to put in a new severity into CodeChecker's system.


CodeChecker actually uses a two-dimensional classification: I think severities indicate some sort of a relative "power" of the report itself - the higher the severity, the more likely the reported issue will actually cause trouble. I don't know the nitty-gritty for where the demarcation line is drawn between low and medium. My previous comment, as follows:

A low diagnostics (and everything "above", assuming a(n at least) partial ordering on severities) should mean the coding construct is problematic: there is a chance -- perhaps once in a lifetime, perhaps not -- that doing this will cause a real error. A style thing should mean [snip]. No real "game-breaking" issue should ever arise from deciding on fixing or ignoring a style check's output.

was to indicate that any border between style and, let's say, whatever-else on the other hand, should be very clear immediately - style issues are not something to react with a "STOP THE WORLD!" approach - in case of a CI hook, if a developer receives a style report on their to-be-added patch, sure please fix it before the commit, but in case a badly formatted code had already made it into the upstream repository, it's not urgent to have it fixed. (This is the same we do in LLVM, too, except that we don't really have patch-CIs yet, sadly.) This is what I meant with style being applicable for some Tidy checkers, not just format.

Another classification, orthogonal to severity used in CodeChecker is dubbed "checker profiles" which assign checkers into, once again, arbitrarily named categories. When you run CodeChecker, you can do --enable default --enable extreme --enable alpha which will turn on every checker from the two categories, and the alpha. Clang-SA checker "group" or "namespace". These categories (or "profiles"), according to the comments in the file, were decided based on the "strength" of the checker (as opposed to the "strength" of the report), namely that checkers with higher false-positive rate (which directly translates to an inverse desire to even look at the reports) are grouped into "more extreme" categories. This is simply to give users a good enough "starting point" for their analysis, everyone can run the analyzers however they see fit.

I may have missed this in prior discussions, and if so, I'm sorry -- but why are we taking CodeChecker as the model for this? I'm not opposed to having some notion of severity for our checks (basically every tool and many coding standards have the same concept), but I am not certain why we would say CodeChecker's way is the way we want clang-tidy to progress. Also, does this tie in with the clang static analyzer checks, or is the severity stuff only for clang-tidy?

clang-tools-extra/docs/clang-tidy/checks/list.rst
414

I'm not keen on "is against a specific coding guideline" because things like the CERT secure coding standard and HIC++ are both "coding guidelines" yet failures against them are almost invariably not stylistic.

417

"Hard to read/understand" code can also be based on style choices -- so how to differentiate between low and style checks?

423

"Run-time error" is a strange bar because any logical mistake in the code is a run-time error. However, we also don't want to limit high severity cases to only things that cause crashes because some kinds of crashes are not bad (like assertion failures) while other kinds of non-crashes are very bad (like dangerous uses of format specifiers that can lead to arbitrary code execution).

I may have missed this in prior discussions, and if so, I'm sorry -- but why are we taking CodeChecker as the model for this? I'm not opposed to having some notion of severity for our checks (basically every tool and many coding standards have the same concept), but I am not certain why we would say CodeChecker's way is the way we want clang-tidy to progress. Also, does this tie in with the clang static analyzer checks, or is the severity stuff only for clang-tidy?

Same thoughts here. In my opinion it would be preferable if Clang tidy doesn't make any promises about being aligned with CodeChecker severity levels.

I may have missed this in prior discussions, and if so, I'm sorry -- but why are we taking CodeChecker as the model for this?

I went ahead and use it because:

  • it is there and maintained (I contributed to the list a few time)
  • it is pretty good from my experience (it is rare that I see the list of checker/severity and disagree with the evaluation)
  • it is a good start to trigger some discussions
  • codechecker upstream is also involved in clang-tidy

I may have missed this in prior discussions, and if so, I'm sorry -- but why are we taking CodeChecker as the model for this?

I went ahead and use it because:

  • it is there and maintained (I contributed to the list a few time)

There are other models that exist and are maintained.

  • it is pretty good from my experience (it is rare that I see the list of checker/severity and disagree with the evaluation)

Other models are also pretty good.

  • it is a good start to trigger some discussions

Definitely agreed! However, I think an RFC would have been a less invasive approach to starting those discussions.

  • codechecker upstream is also involved in clang-tidy

As are other tools and coding standards.

For me personally, I like the idea of giving users some idea of the severity for any given check. I think it provides valuable information to users and is a good thing to document, but only when applied consistently across checks. If we can't find a consistent heuristic to use across all the coding standards and one-off checks we support, the utility of telling users about the severity is pretty hampered (or worse yet, gives a false sense of how bad a problem may be). I'd rather see the severity stuff reverted out of trunk until we've got some broader community agreement that 1) we want this feature, 2) we enumerate concrete steps for how to pick a severity for any given check, and 3) what to do when our severity differs from an external severity in the case of checkers for coding standards with that concept.

FWIW, I didn't notice this change was even proposed because it happened in a review about moving from a long list of checkers to one using tables and from what I can tell, it looks like the patch landed without that review being accepted (in fact, it seems to have landed with a code owner having marked it as requiring changes).

OK, do you want me to prepare a patch to remove the severities?
or to update the values using another list?

OK, do you want me to prepare a patch to remove the severities?
or to update the values using another list?

I think we should remove the severities from the table for now. The rest of the table looks fine to me and is a big improvement over the lists.

sylvestre.ledru abandoned this revision.Dec 31 2019, 11:29 AM

ok, thanks!
I will remove them tomorrow or the next day.

Do you have any guidance about the next steps to add them back?

@aaron.ballman done in https://reviews.llvm.org/D72049

By the way, when you say:

There are other models that exist and are maintained.

Other models are also pretty good.

which lists do you have in mind?
thanks

ok, thanks!
I will remove them tomorrow or the next day.

Do you have any guidance about the next steps to add them back?

Yes, sorry about failing to talk about it! I think this is worth an RFC that CCs some of the main folks from clang-tidy and the clang static analyzer to see if there's an appetite for supporting the concept. The RFC can include information like what problem this is solving, why we should pay the maintenance and review burden to support it, and some concrete heuristics for picking a severity as consistently as possible (what you have above is an okay start, but often won't lead to consistently picking a severity because of the overlap in the descriptions). As part of the RFC, it would be helpful if you pointed out how some of the coding standards we support calculate severities (if you can find the information) and how related tools like codechecker (etc) calculate severity to see if we can find a heuristic that works for us. If you don't have all of the answers in the RFC, that's fine -- the hope is to get the discussion going in the right directions, not to start off with a perfect solution.

@aaron.ballman done in https://reviews.llvm.org/D72049

By the way, when you say:

There are other models that exist and are maintained.

Other models are also pretty good.

which lists do you have in mind?
thanks

I know one reasonably well because I worked on the coding standard, which is CERT's way of calculating rule priorities based on several independent factors. More information can be found at: https://wiki.sei.cmu.edu/confluence/display/c/How+this+Coding+Standard+is+Organized#HowthisCodingStandardisOrganized-RiskAssessment but the heuristics are the same for the C++ coding standard as well. Also, it is common for static analysis tools to calculate severities for given check violations, so we may want to see if any of them document their heuristics.