[Analyzer] Iterator Checkers
Needs ReviewPublic

Authored by baloghadamsoftware on Wed, Apr 12, 5:06 AM.



IteratorRangeChecker, MismatchedIteratorChecker and InvalidatedIteratorChecker integrated in one single checker file. The checker always keeps track of the status of the iterators, but verification of out-of-range access, invalidated iterator access and mismatched iterator use can be separately enabled or disabled.

Tested on real code (Clang) without crash. Some false positives are still present.

Diff Detail

@baloghadamsoftware Thanks for working on this!

However, this patch is getting too big. Could you, please, split it into incremental patches so that it would be easier to review? More motivation of why this is very important is here http://llvm.org/docs/DeveloperPolicy.html#incremental-development

NoQ added a comment.Wed, Apr 19, 9:57 PM

I think for this case it'd be great to (instead) add comments all over the place (especially in checker callbacks, eg. check[Pre|Post]Call() function bodies) to indicate what check every piece of code is for. That'd be equally easy to review, at least for me, but it'd also help greatly if we start splitting up checkers into modelling and checking parts.


I've got a fresh idea on this: Maybe it'd be easier to make an llvm::PointerUnion<SymbolRef, const MemRegion *> and enjoy a single map from that to IteratorPosition, rather than maintaining two maps.

I don't insist though, because i didn't really think if it makes the code any simpler.

NoQ added a comment.Wed, Apr 19, 10:59 PM


  1. Could you do renaming the checker file in a separate patch, so that we saw an actual diff, not a whole greenish file, here on phabricator?
  2. The invalidated iterator checker looks relatively small (a single check and a few rounds of iterator invalidations), would it be hard to split it up into a separate patch? That'd leave us with just one checker added.

Hello! I am not sure how to split it to incremental patches. As I mentioned in the description, the state of the iterator position is always tracked. So if I remove some of the checks it does not make the patch much smaller, since the checking part is quite small. If I remove parts of the tracking, I introduce lots of false positives and uncaught bugs.

Container and offset tracking is needed for both range and validity checks. Match check does not need the offset, but removing it means lots of work, and does not help either because the second patch (adding range and validity checks incrementally) will remain huge.

Doing the rename in a separate patch is useless, because this checker is totally new: the whole position tracking uses a new concept. What I reused is the method of transferring the position between various SVals. Doing the rename separately would mean the same amount of green and lots of red.

What I could do is to rearrange of the functions a bit (separate tracking and checking) and adding more comments to the function. Would it help?

NoQ added a comment.Fri, Apr 21, 3:50 AM

I had a look at how hard it is to split the patch.

In ~3 hours i reached this far:

That job is around 40% done; in patch 0002, the checker has around 1.5k lines of code. Would you be willing to continue this? It should be possible to go on splitting up features until there's just one checker under, say, maybe ~500 lines of code (blind guess), with the only positive being simple_bad_end() in iterator-range.cpp.

My workflow was like this:

  1. Put the original patch into a git branch.
  2. Choose a feature that seems relatively independent. Good canindates for such features are:
    • Support for particular API.
    • Any particular hack.
    • Any particular rule to check in the checker.
    • The checker itself when it's already small enough.
  3. Remove the feature and all tests that start failing.
    • Watch out for "unused function" compiler warnings.
    • Watch out for related tests, eg. remove "good" when you remove "bad".
    • Watch out for the system header simulator:
      • You'd need to update diagnostics/explicit-suppression.cpp test every time.
      • See if what you're removing was actually added by you, and no unrelated analyzer tests fail when you remove it.
    • This step is quite blind. If something is too interdependent, pick another feature.
  4. Commit the removal to your branch.
  5. Go to 2 unless satisfied with the work you've done.
  6. Mass revert all removal commits in the opposite order. Revert of removal is addition, hence the revert-commits can go on review.
  7. Mass squash all removal commits into the original commit. That'd be the first commit to go on review.

If you continue this, you'd take my patches 0001 and 0002 instead of your original patch, follow the same method, and then once you're done, add my other patches.

I believe the next good steps would be to remove the invalidated iterator checker (together with support for the clear() API) and then probably try to split out the support for iterator comparisons or support for increments/decrements, then maybe separate access checks from dereference checks.

Once this is done, it'd be great to have a separate review for each patch; it'd put a bit of stress on you to do the rebases once you address comments on the older patches, but that'd make everybody who wants to review the patch, either here on phabricator, or after the commit, a lot happier. I understand that it's not entirely a pleasant thing to do; i also understand that i might have caused confusion by asking you to merge checkers, while i didn't really mean to ask you to merge the patches as well; but it seems that it is universally accepted around there that changes should be small, so i guess it'd worth it.

Note that the point of incremental development is, of course, not splitting up huge patches before submitting to the review; the point is to actually start developing the second small patch only after the first small patch is published. Like, it's perfectly fine to submit an unfinished checker that does 10% of what you want and finds just one tiny bug and has completely obvious false positives due to lack of various modeling, as long as you say "i'd add these features in a follow-up patch". But now that we'd have to incrementally develop retroactively, it is still considered to be better than having huge commits.