This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Add misc-invalidated-iterators check.
Needs ReviewPublic

Authored by mnbvmar on Jan 25 2017, 3:11 PM.

Details

Summary

This is the alpha version of a check looking for possible uses of invalidated references or iterators.

As for now, only references to std::vector are checked. (This is quite easy to fix, though).
As for now, it considers only references to the elements. Considering pointers or iterators is harder and probably requires some deeper analysis (as in misc-use-after-move check).

Diff Detail

Event Timeline

mnbvmar created this revision.Jan 25 2017, 3:11 PM

Please mention this check in docs/ReleaseNotes.rst (see previous releases as example).

test/clang-tidy/misc-invalidated-iterators.cpp
63

Unnecessary empty line.

75

Unnecessary empty line.

126

Please fix this.

Eugene.Zelenko set the repository for this revision to rL LLVM.
Eugene.Zelenko added a subscriber: cfe-commits.

General question: isn't Static Analyzer is better place for such workflow checks?

Prazek added inline comments.Jan 25 2017, 3:44 PM
clang-tidy/misc/InvalidatedIteratorsCheck.cpp
61

Please tests for all of this functions

92

.insert({MemoTuple, false})

99

I guess this might be to optimistic assumption. Normally we should be pesimistic to not introduce false positives. I will run thi check on llvm code base with SmallVector instead of std::vector and will see.

175–179

const auto BlockMap = std::make_unique<StmtToBlockMap>(TheCFG.get(), Result.Context)

etc.

182–184

remove extra braces

docs/clang-tidy/checks/misc-invalidated-iterators.rst
18

suggest changing it to
ref = 42;

When first reading I thought that it is iterator

test/clang-tidy/misc-invalidated-iterators.cpp
11–17

iterator in vector is just raw pointer, so I guess you can replace it with

using iterator = Type*;

29

Few testcases:

  • passing vector as pointer
  • passing vector as copy, it shouldn't care what happens

to modyfied vector.

  • modyfing vector inside loop - does it finds it? like: for (auto &ref : vec) { vec.push_back(42); ref = 42; }
125

add new line to file end

Prazek edited edge metadata.Jan 26 2017, 4:12 AM

General question: isn't Static Analyzer is better place for such workflow checks?

Probably yes, but it is much harder to implement this there. Other problem is that it would be probably a part
of one of the alpha checks, that are not developed and I don't know if they will ever be fully supported.

clang-tidy/misc/InvalidatedIteratorsCheck.h
35

drop clang-tidy. It's cleaner

37

).

General question: isn't Static Analyzer is better place for such workflow checks?

Probably yes, but it is much harder to implement this there. Other problem is that it would be probably a part
of one of the alpha checks, that are not developed and I don't know if they will ever be fully supported.

But it'll be necessary to re-implement part of Static Analyzer functionality in Clang-tidy. Will it be generic enough to be used for similar purposes?

It'll be good idea to find out Static Analyzer release criteria.

The static analyzer is definitely the place to go for bug detection that requires path sensitivity. It's also reasonably good for anything that needs flow-sensitivity. Although, most flow-sensitive analysis (such as liveness) now live in lib/Analysis/, which is used by both Sema and the Clang Static Analyzer. Both path-sensitive and flow-sensitive analysis are based on clang's CFG. I do not know of clang-tidy uses CFG or lib/Analysis/.

Here are the wikipedia definitions of sensitivity I am referring to:
//- A flow-sensitive analysis takes into account the order of statements in a program. For example, a flow-insensitive pointer alias analysis may determine "variables x and y may refer to the same location", while a flow-sensitive analysis may determine "after statement 20, variables x and y may refer to the same location".

  • A path-sensitive analysis computes different pieces of analysis information dependent on the predicates at conditional branch instructions. For instance, if a branch contains a condition x>0, then on the fall-through path, the analysis would assume that x<=0 and on the target of the branch it would assume that indeed x>0 holds. //

There is work underway in the analyzer for adding IteratorPastEnd checker. The first commit was rather large and has been reviewed here https://reviews.llvm.org/D25660. That checker is very close to be moved out of alpha. Moving it out of alpha is pending evaluation on real codebases to ensure that the false positive rates are low and enhancements to error reporting.

Other problem is that it would be probably a part
of one of the alpha checks, that are not developed and I don't know if they will ever be fully supported.

There seems to be a misconception about what the alpha checkers are. All checkers start in alpha package to allow in-tree incremental development. Once the checkers are complete, they should move out of alpha. The criteria is that the code should pass the review, the checker needs to have low false positive rates, finally, the error reports should be of good quality (some checks greatly benefit from extra path hints that can be implemented with a visitor). These are motivated by providing a good user experience to end users. (We do have several checkers that are "stuck in alpha". A lot of them are abandoned experiments that just need to be deleted; others could probably be made production quality with some extra effort.)

The static analyzer is definitely the place to go for bug detection that requires path sensitivity. It's also reasonably good for anything that needs flow-sensitivity. Although, most flow-sensitive analysis (such as liveness) now live in lib/Analysis/, which is used by both Sema and the Clang Static Analyzer. Both path-sensitive and flow-sensitive analysis are based on clang's CFG. I do not know of clang-tidy uses CFG or lib/Analysis/.

Here are the wikipedia definitions of sensitivity I am referring to:
//- A flow-sensitive analysis takes into account the order of statements in a program. For example, a flow-insensitive pointer alias analysis may determine "variables x and y may refer to the same location", while a flow-sensitive analysis may determine "after statement 20, variables x and y may refer to the same location".

  • A path-sensitive analysis computes different pieces of analysis information dependent on the predicates at conditional branch instructions. For instance, if a branch contains a condition x>0, then on the fall-through path, the analysis would assume that x<=0 and on the target of the branch it would assume that indeed x>0 holds. //

There is work underway in the analyzer for adding IteratorPastEnd checker. The first commit was rather large and has been reviewed here https://reviews.llvm.org/D25660. That checker is very close to be moved out of alpha. Moving it out of alpha is pending evaluation on real codebases to ensure that the false positive rates are low and enhancements to error reporting.

Other problem is that it would be probably a part
of one of the alpha checks, that are not developed and I don't know if they will ever be fully supported.

There seems to be a misconception about what the alpha checkers are. All checkers start in alpha package to allow in-tree incremental development. Once the checkers are complete, they should move out of alpha. The criteria is that the code should pass the review, the checker needs to have low false positive rates, finally, the error reports should be of good quality (some checks greatly benefit from extra path hints that can be implemented with a visitor). These are motivated by providing a good user experience to end users. (We do have several checkers that are "stuck in alpha". A lot of them are abandoned experiments that just need to be deleted; others could probably be made production quality with some extra effort.)

I think we need some sort of clear guidelines on where what functionality should be added. Even right now there are clang-tidy checks that finds subset of alpha checks, but probably having lower false positive rate. The other example is Use after move, that is doing similar thing as uninitialized values analysis in clang.

I think we need some sort of clear guidelines on where what functionality should be added. Even right now there are clang-tidy checks that finds subset of alpha checks, but probably having lower false positive rate. The other example is Use after move, that is doing similar thing as uninitialized values analysis in clang.

I agree with this and could add that we also need similar guidelines for Clang diagnostics too. Some of Clang-tidy and Static Analyzer checks are more suited for Clang diagnostics (unused constructs, deadcode.DeadStores, etc). It will be reasonable if we have discussions to decide better place for particular check instead of just swallowing patches.

Before clang-tidy came into existence the guidelines were very clear. One should write a clang warning if the diagnostic:

  • can be implemented without path-insensitive analysis (including flow-sensitive)
  • is checking for language rules (not specific API misuse)

In my view, this should still be the rule going forward because compiler warnings are most effective in reaching users.

The Clang Static Analyzer used to be the place for all other diagnostics. Most of the checkers it contains rely on path-sensitive analysis. Note that one might catch the same bug with flow-sensitive as well as path-sensitive analysis. So in some of those cases, we have both warnings as well as analyzer checkers finding the same type of user error. However, the checkers can catch more bugs since they are path-sensitive. The analyzer also has several flow-sensitive/ AST matching checkers. Those checks could not have been written as warnings mainly because they check for APIs, which are not part of the language.

My understanding is that clang-tidy supports fixits, which the clang static analyzer currently does not support. However, there could be other benefits to placing not path-sensitive checks there as well. I am not sure. I've also heard opinions that it's more of a linter tool, so the user expectations could be different.

Even right now there are clang-tidy checks that finds subset of alpha checks, but probably having lower false positive rate.

Again, alpha checks are unfinished work, so we should not use them as examples of checkers that have false positives. Some of them are research projects and should probably be deleted.

alexfh edited edge metadata.Feb 3 2017, 9:47 AM

Before clang-tidy came into existence the guidelines were very clear. One should write a clang warning if the diagnostic:

  • can be implemented without path-insensitive analysis (including flow-sensitive)
  • is checking for language rules (not specific API misuse)

In my view, this should still be the rule going forward because compiler warnings are most effective in reaching users.

The Clang Static Analyzer used to be the place for all other diagnostics. Most of the checkers it contains rely on path-sensitive analysis. Note that one might catch the same bug with flow-sensitive as well as path-sensitive analysis. So in some of those cases, we have both warnings as well as analyzer checkers finding the same type of user error. However, the checkers can catch more bugs since they are path-sensitive. The analyzer also has several flow-sensitive/ AST matching checkers. Those checks could not have been written as warnings mainly because they check for APIs, which are not part of the language.

My understanding is that clang-tidy supports fixits, which the clang static analyzer currently does not support. However, there could be other benefits to placing not path-sensitive checks there as well. I am not sure. I've also heard opinions that it's more of a linter tool, so the user expectations could be different.

Even right now there are clang-tidy checks that finds subset of alpha checks, but probably having lower false positive rate.

Again, alpha checks are unfinished work, so we should not use them as examples of checkers that have false positives. Some of them are research projects and should probably be deleted.

I tried to come up with some advice on where checks should go in http://clang.llvm.org/extra/clang-tidy/#choosing-the-right-place-for-your-check:

Clang diagnostic: if the check is generic enough, targets code patterns that most probably are bugs (rather than style or readability issues), can be implemented effectively and with extremely low false positive rate, it may make a good Clang diagnostic.
Clang static analyzer check: if the check requires some sort of control flow analysis, it should probably be implemented as a static analyzer check.
clang-tidy check is a good choice for linter-style checks, checks that are related to a certain coding style, checks that address code readability, etc.

There's no doubt path-sensitive checks should go to Static Analyzer, since it provides all the infrastructure for path-sensitive analyses. Whatever meets the criteria for a Clang diagnostic should be a diagnostic. Whatever needs automated fixes (and can be implemented on AST or preprocessor level) should go to clang-tidy.

But there's still a large set of analyses that don't clearly fall into one of the categories above and can be implemented both in Clang Static Analyzer and in clang-tidy. Currently there are no firm rules about those, only recommendations on the clang-tidy page (quoted above). We might need to agree upon some reasonable guidelines, though.

For example, IMO, AST-based analyses make more sense in clang-tidy for a few reasons:

  1. They usually are easier expressible in terms of AST matchers, and clang-tidy allows to use AST matchers with less boilerplate.
  2. Clang Static Analyzer is linked into clang, where AST matchers are undesired, since they tend to blow the size of binary significantly.
  3. It's more consistent to keep all similar analyses together, it simplifies search for already implemented similar functionality and code reviews.

Flow-sensitive analyses (that don't need any path-sensitivity) seem to be equally suitable for SA and clang-tidy (unless I'm missing something), so I don't feel strongly as to where they should go.

WDYT?

jbcoe added a subscriber: jbcoe.Feb 3 2017, 10:05 AM
Prazek added a comment.Feb 4 2017, 6:40 AM

Before clang-tidy came into existence the guidelines were very clear. One should write a clang warning if the diagnostic:

  • can be implemented without path-insensitive analysis (including flow-sensitive)
  • is checking for language rules (not specific API misuse)

In my view, this should still be the rule going forward because compiler warnings are most effective in reaching users.

The Clang Static Analyzer used to be the place for all other diagnostics. Most of the checkers it contains rely on path-sensitive analysis. Note that one might catch the same bug with flow-sensitive as well as path-sensitive analysis. So in some of those cases, we have both warnings as well as analyzer checkers finding the same type of user error. However, the checkers can catch more bugs since they are path-sensitive. The analyzer also has several flow-sensitive/ AST matching checkers. Those checks could not have been written as warnings mainly because they check for APIs, which are not part of the language.

My understanding is that clang-tidy supports fixits, which the clang static analyzer currently does not support. However, there could be other benefits to placing not path-sensitive checks there as well. I am not sure. I've also heard opinions that it's more of a linter tool, so the user expectations could be different.

Even right now there are clang-tidy checks that finds subset of alpha checks, but probably having lower false positive rate.

Again, alpha checks are unfinished work, so we should not use them as examples of checkers that have false positives. Some of them are research projects and should probably be deleted.

I tried to come up with some advice on where checks should go in http://clang.llvm.org/extra/clang-tidy/#choosing-the-right-place-for-your-check:

Clang diagnostic: if the check is generic enough, targets code patterns that most probably are bugs (rather than style or readability issues), can be implemented effectively and with extremely low false positive rate, it may make a good Clang diagnostic.
Clang static analyzer check: if the check requires some sort of control flow analysis, it should probably be implemented as a static analyzer check.
clang-tidy check is a good choice for linter-style checks, checks that are related to a certain coding style, checks that address code readability, etc.

There's no doubt path-sensitive checks should go to Static Analyzer, since it provides all the infrastructure for path-sensitive analyses. Whatever meets the criteria for a Clang diagnostic should be a diagnostic. Whatever needs automated fixes (and can be implemented on AST or preprocessor level) should go to clang-tidy.

But there's still a large set of analyses that don't clearly fall into one of the categories above and can be implemented both in Clang Static Analyzer and in clang-tidy. Currently there are no firm rules about those, only recommendations on the clang-tidy page (quoted above). We might need to agree upon some reasonable guidelines, though.

For example, IMO, AST-based analyses make more sense in clang-tidy for a few reasons:

  1. They usually are easier expressible in terms of AST matchers, and clang-tidy allows to use AST matchers with less boilerplate.
  2. Clang Static Analyzer is linked into clang, where AST matchers are undesired, since they tend to blow the size of binary significantly.
  3. It's more consistent to keep all similar analyses together, it simplifies search for already implemented similar functionality and code reviews.

Flow-sensitive analyses (that don't need any path-sensitivity) seem to be equally suitable for SA and clang-tidy (unless I'm missing something), so I don't feel strongly as to where they should go.

WDYT?

This sounds resonable. I think there are also other factors like:

  • heuristics: clang-tidy tends to look for bugs based on some heuristics. E.g use-after-move consider methods as reinitializing based on method names (like clear).

This is approach works great if the heuristic is good enough to not have many false positives, but generally it will have some false positives and false negatives.

  • flexibility: The other problem is that check can be flexible, for example to work on every vector-like container. I think static analyzer tends to be more conservative about bugs it wants to find. E.g we know how std::vector works, so the check will find only bugs associated with it trying to not introduce false positives

I will try to think about other factors, but firstly let me give a couple of examples. Please tell me where would you put these checks and exactly why:

use-after-move:

  • does complex walk on CFG similar to path analysis
  • has heuristics what is reinitialization

this check:

  • uses the same utilities as use-after-move
  • will works for different types of containers like std::unordered_map or llvm::DenseMap, which indicates heuristics

check null after dereference, looks for code like:

int *p = foo();
*p = 42;
if (p) {;;;}
//of 
if (!p) {;;;}

Some checks from static analyzer will find the body of if (!p) as dead code, and could probably find if (p) as expression always true.
I am not sure how it would handle cases like

int *p = foo();
*p = 42;
if (b) {
  p = otherFoo();
}

if (!p) {;;;}

Should static analyzer flag use like this?
If we want to have low false positive rate, then probably not.
On the other hand I would love if clang-tidy would flag this, because it seems like a buggy code (if (!p) {;;;} could be hoisted to the other if)

Another example, fiding out of range:

int glob[10];

int sum = 0;
for (int i = 0; i <= 10; i++) {
  sum += glob[i];
}

So this sounds like it a good diagnostic for clang, and even gcc finds it, but only with -O2.
Of course it won't flag cases like:

for (int i = 0; i <= 10; i++) {
  if (glob[i] == v)
    return true;
}

Because the loop might end before going to element pass array. I am not sure how well we can model
this with static analyzer, and if it is even possible to write check that would warn about cases like this, that are bugs (we can optimize it to single return true).

I tried to come up with some advice on where checks should go in http://clang.llvm.org/extra/clang-tidy/#choosing-the-right-place-for-your-check

The guidelines stated on the clang-tidy page seem reasonable to me.

For example, IMO, AST-based analyses make more sense in clang-tidy for a few reasons:

They usually are easier expressible in terms of AST matchers, and clang-tidy allows to use AST matchers with less boilerplate.
Clang Static Analyzer is linked into clang, where AST matchers are undesired, since they tend to blow the size of binary significantly.
It's more consistent to keep all similar analyses together, it simplifies search for already implemented similar functionality and code reviews.

I agree that there is a gray area specifically in the AST-matching-based bug finding, which unfortunately leads to duplication of effort. However, we currently cannot ban/move all AST-based checks out of the analyzer. The main problem I see with restricting the AST-based analysis to clang-tidy is impact on the end users. While clang-tidy does export the clang static analyzer results, the reverse does not hold. There are many end users who do not use clang-tidy but do use the clang static analyzer (either directly or through scan-build). Until that is the case, I do not think we should disallow AST based checks from the analyzer, especially if they come from contributors who are interested in contributing to this tool. Note that the format in which the results are presented from these tools is not uniform, which makes transition more complicated.

The AST matchers can be used from the analyzer and if the code size becomes a problem, we could consider moving the analyzer out of the clang codebase.

Generally, I am not a big fan of the split between the checks based on the technology used to implement them since it does not lead to a great user interface - the end users do not think in terms of path-sensitive/flow-sensitive/AST-matching, they just want to enable specific functionality such as find bugs or ensure they follow coding guidelines. This is why I like the existing guidelines with their focus on what clang-tidy is from user perspective:

clang-tidy check is a good choice for linter-style checks, checks that are related to a certain coding style, checks that address code readability, etc.

Another thing that could be worth adding here is that clang-tidy finds bugs that tend to be easy to fix and, in most cases, provide the fixits. (I believe, this is one of the major strengths of the clang-tidy tool!)

Flow-sensitive analyses (that don't need any path-sensitivity) seem to be equally suitable for SA and clang-tidy (unless I'm missing something), so I don't feel strongly as to where they should go.

As far as I know, currently, all flow-sensitive analysis are in the Analysis library in clang. These are analysis for compiler warnings and analysis used by the static analyzer. I believe this is where future analysis should go as well, especially, for analysis that could have multiple clients. If clang code size impact turns out to be a serious problem, we could also have that library provide APIs that could be used from other tools/checks. (Note, the analysis could be implemented in the library, but the users of the analysis (checks) can be elsewhere.)

Regarding the points about "heuristics" and "flexibility", the analyzer is full of heuristics and fuzzy API matching. These techniques are very common in static analysis in general as we are trying to solve undecidable problems and heuristics is the only way to go.

Prazek added a comment.Feb 6 2017, 1:11 PM

I tried to come up with some advice on where checks should go in http://clang.llvm.org/extra/clang-tidy/#choosing-the-right-place-for-your-check

The guidelines stated on the clang-tidy page seem reasonable to me.

For example, IMO, AST-based analyses make more sense in clang-tidy for a few reasons:

They usually are easier expressible in terms of AST matchers, and clang-tidy allows to use AST matchers with less boilerplate.
Clang Static Analyzer is linked into clang, where AST matchers are undesired, since they tend to blow the size of binary significantly.
It's more consistent to keep all similar analyses together, it simplifies search for already implemented similar functionality and code reviews.

I agree that there is a gray area specifically in the AST-matching-based bug finding, which unfortunately leads to duplication of effort. However, we currently cannot ban/move all AST-based checks out of the analyzer. The main problem I see with restricting the AST-based analysis to clang-tidy is impact on the end users. While clang-tidy does export the clang static analyzer results, the reverse does not hold. There are many end users who do not use clang-tidy but do use the clang static analyzer (either directly or through scan-build). Until that is the case, I do not think we should disallow AST based checks from the analyzer, especially if they come from contributors who are interested in contributing to this tool. Note that the format in which the results are presented from these tools is not uniform, which makes transition more complicated.

The AST matchers can be used from the analyzer and if the code size becomes a problem, we could consider moving the analyzer out of the clang codebase.

Generally, I am not a big fan of the split between the checks based on the technology used to implement them since it does not lead to a great user interface - the end users do not think in terms of path-sensitive/flow-sensitive/AST-matching, they just want to enable specific functionality such as find bugs or ensure they follow coding guidelines. This is why I like the existing guidelines with their focus on what clang-tidy is from user perspective:

Agree with it.

clang-tidy check is a good choice for linter-style checks, checks that are related to a certain coding style, checks that address code readability, etc.

Another thing that could be worth adding here is that clang-tidy finds bugs that tend to be easy to fix and, in most cases, provide the fixits. (I believe, this is one of the major strengths of the clang-tidy tool!)

Flow-sensitive analyses (that don't need any path-sensitivity) seem to be equally suitable for SA and clang-tidy (unless I'm missing something), so I don't feel strongly as to where they should go.

As far as I know, currently, all flow-sensitive analysis are in the Analysis library in clang. These are analysis for compiler warnings and analysis used by the static analyzer. I believe this is where future analysis should go as well, especially, for analysis that could have multiple clients. If clang code size impact turns out to be a serious problem, we could also have that library provide APIs that could be used from other tools/checks. (Note, the analysis could be implemented in the library, but the users of the analysis (checks) can be elsewhere.)

Regarding the points about "heuristics" and "flexibility", the analyzer is full of heuristics and fuzzy API matching. These techniques are very common in static analysis in general as we are trying to solve undecidable problems and heuristics is the only way to go.

I guess the main problem is that you can't use clang-tidy checks from scan build, where there are some checks like use-after-move, and bunch of misc checks, that would totally fit into what user want from scan build.