This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Checker for iterators dereferenced beyond their range.
ClosedPublic

Authored by baloghadamsoftware on Oct 16 2016, 8:57 AM.

Details

Summary

This checker checks for iterators dereferenced when they are equal to the end() of their container. Return value of any end() method is tracked if its type has the same properties as a typical iterator (can be incremented, dereferenced, and its name ends with "iterator", "iter" or "it"). STL functions that search a value or range are evaluated by the checker as an optimization.

Diff Detail

Event Timeline

baloghadamsoftware retitled this revision from to [Analyzer] Checker for iterators dereferenced beyond their range..
baloghadamsoftware updated this object.
baloghadamsoftware added a reviewer: dcoughlin.
dkrupp added a subscriber: dkrupp.Oct 16 2016, 11:28 PM
NoQ added a subscriber: NoQ.Oct 17 2016, 12:22 PM
NoQ added a subscriber: a.sidorin.Oct 18 2016, 1:57 AM

Wow, you managed to check something that could be checked without going through a hell of modeling dozens of STL methods, and probably even without stepping on poor C++ temporary object modeling in the analyzer, which sounds great.

These comments are incomplete because i didn't yet take my time to understand how your program state traits work; hope to come back to this a bit later.

Adding Alexey because he's fond of iterators.

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
116

Maybe llvm::PointerUnion?

176

I think functions should always begin with a stack frame context, not sure, does this ever get violated? Do we have checkBeginBlock? Sorry if i'm wrong.

184

LLVM cast<> should be used, because it asserts cast correctness through LLVM's custom RTTI (and LocationContext child classes do support that).

196

I think this trick needs more comments/explaining. It is very unusual. Are you trying to model effects of passing an iterator by value into a function? What part of these effects are not modeled magically by the core?

359

So the thing about evalCall is that every call can only be eval'ed by only one checker. So if you're doing this, you should be sure that your checker is modelling all effects of the call on everything in the program state, manually, and any checker that relies on that modelling should make sure that your checker is turned on.

Because the functions you are modelling are pure, i think it's, in general, a good idea to evalCall() them. Other checkers should be able to rely on PreCall/PostCall events to model their state changes.

So the question is, in what checker do we want this modelling to happen. Because your checker is looking for very specific errors, it might be a good idea to eventually split it into a separate checker. I think, at least, a FIXME for this task should be left around. I'm also currently tackling with a single checker to model all standard library functions (D20811), maybe i'd come up with a way to merge it there.

444

Accessing end() is a UB, we should probably generate a fatal error node here.

447

I think path-sensitive checkers should present their findings proudly. After all, they did their best to find a single execution path on which the problem certainly manifests.

522

Number of arguments of CE should be checked beforehand. Yes, it is UB to modify namespace std:: to introduce functions with same names but less arguments, but we still should not crash when we see such code.

569

It's not analyzer's fault :) We're inspecting the AST here.

Anyway, does CXXRecordDecl::needsImplicitCopyAssignment() look useful?

test/Analysis/iterator-past-end.cpp
4

We should probably separate this into an #include-able header in test/Analysis/Inputs/.

Also, there's always a bit of concern that it wasn't copy-pasted from a standard library implementation with an incompatible license such as (L)GPL. Which often happens when you do your best to emulate the normal way of defining things as closely as possible.

Updated according to the comments. Also fixed a bug and moved access check to pre-call instead of post-call.

baloghadamsoftware marked 9 inline comments as done.Oct 26 2016, 7:06 AM
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
196

If I pass an iterator by value (the most usual case) I have to assign its position (in or out of range) to the formal parameter from the actual one.

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
569

No, it does not. I need to check whether the type is copiable, since that is a criteria for being an operator (copiable via constructor and assignment, deleteable, incrementable and dereferencable). It seems that while copy constructor and destructor is generated automatically, copy assignment not, at least not in this simple case. So I defaulted it to true, and I set it to false if I find a deleted or a non-public copy assignment.

test/Analysis/iterator-past-end.cpp
4

I did it now, but first one of my tests failed. I fixed the bug, but it turned out that if I include these types and functions, no method or function is checked, just conjured symbols are generated. Should including not behave the same as copying the contents? This happened even if I removed the pragma.

NoQ added a comment.Oct 27 2016, 3:13 AM

Thanks!! Will try to look at the rest of the stuff as soon as possible><

test/Analysis/iterator-past-end.cpp
4

Aha, i guess that's because we don't inline STL headers. See mayInlineCXXStandardLibrary() / -analyzer-config c++-stdlib-inlining.

The lesson to learn here is that it's a good idea to make tests as similar to real code as possible. Because on real code, it would probably also not be inlined.

test/Analysis/iterator-past-end.cpp
4

Actually, I always test first on real code, and it seemed to be inlined. But now, even if I removed the pragma it was not inlined.

a.sidorin added inline comments.Oct 27 2016, 7:09 AM
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
157

This can be written some shorter: if (const auto *InstCall = dyn_cast<CXXInstance>(&Call)

210

As I remember, PostCall is also called for ObjC calls like ObjCMethodCall which may not have FunctionDecl as their callee. So, Func may be a nullptr and needs a check.

220

isa<StackFrameContext>(LCtx)?
And cast<> below already does the same check with an assertion.

259

Just C.getLocationContext()?

306

This loop may be C++11-fied.

323

What will happen if we compare two iterators related to different containers? I guess the result will be undefined but I'm not sure if we can track it in this checker without referencing the owning container. Let's leave this code as-is but I think this choice deserves a comment.

338

Maybe we should just swap Rhs and Lhs if LPos is null? So, we can avoid code duplication.

424

What will happen if we write something like this:

bool Eq1 = it1 == it2;
bool Eq2 = it3 == it4;
if (Eq1) {...}?

As I understand, we'll assume the second condition instead of first.

455

I'm not sure it's totally correct. -- for begin() will give us out-of-range iterator. According to header description, we're catching just "past-end" iterators, but this is confusing a bit for me.
Moreover, if we're out of end() in multiple positions, a single -- will not make the iterator valid again. You use a good conservative approach, but could you please add a comment describing it?

572

Just C.getLocationContext().

574

You can use overload which does not require the tag.

575

getLocationContext => LCtx

606

A common way of defining iterator types is just their declaration as pointers. I'm not sure this code will work well in such cases.
You can see some example in LLVM containers like SmallVector, where iterators are declared in the following way:

typedef T *iterator;
typedef const T *const_iterator;
619

HasCopyCtor, HasCopyAssign, etc.

621

We usually prefer informative names like "Method" or "Ctor".

624

There was a comment. Phabricator disallows me to delete my own comments so I was forced to edit it. Nevermind.

Thank you for this patch! I like some solutions used in it but I also have some comments (inline).

zaks.anna edited edge metadata.Oct 27 2016, 2:25 PM

Actually, I always test first on real code, and it seemed to be inlined. But now, even if I
removed the pragma it was not inlined.

Looks like this patch is interfering with this inlining suppression. We had many false positives without it. Mainly, the analyzer would not understand the invariants of the container data structures.

ExprEngine::defaultEvalCall calls mayInlineCallKind which contains this:
`// Conditionally control the inlining of methods on objects that look

// like C++ containers.
if (!Opts.mayInlineCXXContainerMethods())
  if (!Ctx.getSourceManager().isInMainFile(FD->getLocation()))
    if (isContainerMethod(Ctx, FD))
      return false;`
test/Analysis/iterator-past-end.cpp
4

We often do forward declare in the implementation file as it is done here. We mainly use the Inputs directory to simulate system headers.

NoQ added a comment.Nov 1 2016, 1:07 PM

I think i managed to understand the reasoning behind your solutions! Right now i definitely approve all the high-level logic apart from the handling of left/right SVals for evalAssume, which i think could be easily improved upon without significant drawbacks. See the inline comment.

As for inlining STL headers - ouch. It was supposed to be working (i.e. never inlining), it'd probably be great to know why it gets inlined. STL headers are confusing much more often than helpful to the analyzer in most cases. That said, if we're going to ever revert this decision, i think it's great to have more stuff already working, so i'd not worry about that. If moving stuff to a header defeats the purpose of some of your tests (eg. tests that specifically test what happens if the function is inlined), then probably it'd be a good idea to duplicate the tests, eg:

// RUN: ... -DUSE_HEADER=0 ...
// RUN: ... -DUSE_HEADER=1 ...

#if USE_HEADER
#include "Inputs/..."
#else
// Paste header here.
#endif
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
196

Had a look. So, essentially, the core copies argument values to parameter regions in enterStackFrame() without ever notifying checkers about it in any way. Okaay. Yep, let's stick to that for now, as i've no better approach in mind.

424

Had a look. So the problem is, we obtain the result of the comparison as a symbol, from which it is too hard to recover the operands in order to move iterator position data from one value to another.

Normally we obtain a simple SymbolConjured for the return value of the operator==() call (or, similarly, operator!=()). For plain-value iterators (eg. typedef T *iterator) we might be obtaining an actual binary symbolic expression, but even then it's not quite clear how to obtain operands (the structure of the expression might have changed due to algebraic simplifications). Additionally, LHS and RHS aren't necessarily symbols (they might be semi-concrete), so composing symbolic expressions from them in general is problematic with our symbol hierarchy, which is rarely a problem for numbers but for structural symbols it'd be a mess.

For now i suggest, instead of storing only the last LHS and RHS, to save a map from symbols (which are results of comparisons) to (LHS value, RHS value) pairs. This map should, apart from the obvious, be cleaned up whenever one of the iterators in the pair gets mutated (incremented or decremented). This should take care of the problem Alexey points out, and should work with semi-concrete stuff.

For the future i suggest to let users construct their own symbols and symbolic expressions more easily. In fact, if only we had all iterators as regions, we should have probably used SymbolMetadata for this purpose: it's easy to both recover the parent region from it and use it in symbolic expressions. We could also deprecate the confusing structural symbols (provide default-bound lazy compound values for conjured structures instead), and then it'd be possible to transition to SymbolMetadata entirely.

NoQ added inline comments.Nov 1 2016, 2:18 PM
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
581

Ouch, i have one more concern, which can be expressed with the following false-positive test which currently fails:

void foo() {
  std::vector<int> vec;
  vec.push_back(2016);
  auto i = vec.find(vec.begin(), vec.end(), 2016);
  *i; // no-warning
}

Not instantly sure what to do with this. You can avoid state splits until you are actually sure if both branches are possible, but that'd suppress a lot of useful positives. Such positives could be suppressed with assertions, of course, but i'd still hope there aren't too many of those.

NoQ added inline comments.Nov 1 2016, 2:19 PM
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
581

I mean, std::find(... ><

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
424

Thank you for the suggestion. I am not sure if I fully understand you. If I create a map where the key is the resulting symbol of the comparison, it will not work because evalAssume is called for the innermost comparison. So if the body of operator== (or operator!=) is inlined, then I get a binary symbolic expression in evalAssume, not the SymbolConjured. This binary Symbolic expression is a comparison of the internals of the iterators, e.g. the internal pointer. So the key will not match any LHS and RHS value pair in the map. I also thought on such solution earlier but I dismissed it because of this.

581

False positives can occur whenever we are sure that we will find the element so we do not check for the result to be equal with end().

baloghadamsoftware edited edge metadata.

Interim version, updated according to some of the comments.

baloghadamsoftware marked 9 inline comments as done.Nov 7 2016, 7:43 AM
baloghadamsoftware added inline comments.
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
338

Instead of swapping I moved the code into a separate function and I call this functions now with differenet parameters.

574

There is an overload that does not requires a tag, but it requires a type instad.

baloghadamsoftware marked an inline comment as done.Nov 9 2016, 7:57 AM
baloghadamsoftware added inline comments.
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
424

Maybe if I evaluate the operator==() call for iterators using evalCall()?

NoQ added a comment.Nov 9 2016, 10:46 AM

Sorry for inactivity, been thinking quite a bit about this checker. The checker is very cool because it is an excellent showcase of our API problems in the realm of C++ checkers. Once the checker is committed, we could try various things to make it easier to develop other checkers like this in the future. Also the check is very useful, and improving C++ support in the analyzer is very desired, so again thank you for your work.

Right now the course of action, i think, is to

  • Agree on the evalAssume() implementation (i'm still not quite understanding what the problem is here, see the new inline comments);
  • Add some more comments into the code (especially comment up all the object-copy handling, when iterator state moves from one symbol/region to another symbol/region upon various events).

Then, i think, we should land the commit, assuming that you have a desire to address more issues in subsequent commits to eventually enable it by default.

For enabling by default, the following should most likely be addressed:

  • We should probably not warn by default on unchecked std::find (see comments following the push_back(2016) example), unless some strong arguments against such code patterns are provided;
  • A BugReporterVisitor should be added to report iterator state changes to the user across the diagnostic path;
  • Our code owners often have strong opinions regarding warning message wording.

Then there are a few ideas on finding more bugs, which you shouldn't necessarily implement, that came up during the review, eg.:

  • Alexey suspects that iterators implemented as plain pointers (commonly used in LLVM itself) might be unsupported;
  • Alexey points out that ++/-- may be handled in a less conservative manner;
  • More checks could be implemented in this checker, eg. passing end() as first argument of std::find() is an instant bug (somebody accidentally swapped begin and end?).

A list of ideas on improving core experience, mostly for myself because i seem to be the only one with strong opinions on this:

  • Provide a checker callback for structure copies, which would unify the multitude of similar callbacks in this checker;
  • Consider removing the conjured structure symbol hack.

Did i forget anything?

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
205

This code definitely deserves comments. I managed to understand that this is a workaround for completely replacing the conjured symbol with a lazy value upon calling a method over temporary, which the core does from time to time, and i suspect that this code may break whenever more than one checker starts doing this (i.e. you'd have to skip more than one predecessor node in this case).

I still think that the root cause here is conjured structural symbols which i'd probably prefer to get rid of completely, and then this hack wouldn't be necessary.

424

Well, even if the body of the comparison operator is inlined, PreStmt/PostStmt callbacks should still work, and it doesn't really matter if there's a SymbolConjured or not, we can still add the symbolic expression to our map as a key.

Essentially, you ignore whatever happens in the iterator's operator==() when it's inlined (including any evalAssume events), then in PostStmt of operator==() you map the return-value symbol of the operator to the operator's arguments (operands), then whenever an assumption is being made against the return-value symbol, you carry over this assumption to the operands. I think it shouldn't really matter if the operator call was inlined.

The only unexpected thing that may happen due to inlining is if the inlined operator returns a concrete value (plain true or plain false) instead of the symbol, but in this case what we need to do is to just carry over the assumption to the operands instantly.

581

Yep, so there's a bit of grey area here. The test case i wrote is very artificial, i.e. it is not idiomatic, in fact there aren't many cases when doing find() is actually useful when we're sure the element is there.

However, if we eventually enable this checker by default (move out of the alpha.* package), then i think we need to come up with a better behavior for this case:

  • maybe it depends on container type (eg. for map-like containers we may know that the key is there but we don't know the value?);
  • maybe it's a good idea to add a checker option to enable or disable the warning upon using unchecked find results;
  • maybe we'd learn to reason about containers a bit better, even though it'd be hard.

So i've a feeling this can be moved to a FIXME/later, but it's definitely something to think about.

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
424

Sorry, maybe my phrasing was not accurate enough. The problem is that the assumption is not being made against the return-value symbol of the operator==(), but if inlined, against the internal == operator. So I do not have the same key in evalAssume() thus I cannot access the operands from the map I stored in checkPostCall(). The only solution I can imagine is that I evalCall() the operator==() but then we lose the opportunity to check anything inside the body of the operator.

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
205

I think I do not fully understand you here: do you mean some fix in the core?

zaks.anna added inline comments.Nov 10 2016, 11:38 AM
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
220

At least one advantage of the assert is that it provides an error message. I'd not try to minimize the number of asserts.

424

Thanks for working on this!!!

We've discussed this with Artem and Devin in more detail and here are the notes from the conversation.

Just to summarize, Artem's proposal is to replace the two trates for RHS and LHS with a map from a symbol that represents the result of the iterator comparison to LHS SVal, RHS SVal, and the relation between them (== | !=).

Are you concerned about this case:

bool operator==(const it&RHS) {

return x == RHS.x; // If evalAssume is called here, we are just going to ignore it.

} // We get a post call and can fill in the map from binary symbolic expression to LHS and RHS.

You are right, we will get a binary symbolic expression and not SymbolConjured. And we will not fill in the map until the return from the inlined operator. However, even if the operator is inlined, we will be calling PostCall on it after the return. So at that point, the (binary symbolic expression) -> (LHS, RHS, ==) entry will be added to the map, where the LHS and RHS will be the arguments to the call. The evalAssume will be called on the caller side.

Another example:
bool operator==(const it&RHS) {

if (x == RHS.x)
  return y = RHS.y;
return false; // <- Constant is returned.

}
In this case, a concrete value is returned on one of the branches. The suggestion is not to rely on evalAssume, but record the relation of the iterators based on the value of the constant being returned. When the expression is evaluated on the caller side, one of the branches will be unreachable anyway, so we will not loose precision here even if we do nothing on evalAssume.

Also, could you please add examples that use the inlined and non-inlined operators in the following way to make sure everything still works:

if ( ! (i==e) )

Very Important: You should test your patch with eagerly assume option turned on since this mode the analyzer is on by default and running without eagerly assume is outdated. An option to run without eagerly assume should be removed altogether.

Updated according to comments.

baloghadamsoftware marked 10 inline comments as done.Nov 17 2016, 6:14 AM
baloghadamsoftware added inline comments.
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
210

You are right, and the same is true for PreCall.

220

I agree, but I think Alexei is rigt here: cast<> already has the assert we need here.

323

That will be part of another checker, but where exactly to put the comment you suggest?

424

OK, I did it. My initial problem was that I believed that the return value in checkPostCall will be different from the symbolic expression representing the internal comparison, but no, it was the same. I also put a new trick into evalAssume for the negated case you mention. Furthermore, if eagerly assume is enabled, we get concrete integer as result in checkPostCall so we process the iterator there in this case.

In the automatic test I cannot test inlined operators, because it does not inline anything that is included from a remote file. But I tested it manually, everything seems to work.

baloghadamsoftware marked 2 inline comments as done.EditedNov 17 2016, 6:31 AM
In D25660#590778, @NoQ wrote:
  • Agree on the evalAssume() implementation (i'm still not quite understanding what the problem is here, see the new inline comments);

I think it will be settled soon.

  • We should probably not warn by default on unchecked std::find (see comments following the push_back(2016) example), unless some strong arguments against such code patterns are provided;

It is automatic. The role of evalCall is only to reduce the exploded graph. If I remove it, we get the same result (that is why we have a nonStdFind there, to check this case). but with far more states. Especially in case of vector, where the GNU implementation is quite complicated because of optimizations.

  • A BugReporterVisitor should be added to report iterator state changes to the user across the diagnostic path;

I also thought of this. The question is where to start the chain.

  • Our code owners often have strong opinions regarding warning message wording.

I need suggestions here.

Then there are a few ideas on finding more bugs, which you shouldn't necessarily implement, that came up during the review, eg.:

  • Alexey suspects that iterators implemented as plain pointers (commonly used in LLVM itself) might be unsupported;

I think it is supported now.

  • Alexey points out that ++/-- may be handled in a less conservative manner;

That is a future plan, but then it also results in a new name for the checker, e.g. IteratorRange.

  • More checks could be implemented in this checker, eg. passing end() as first argument of std::find() is an instant bug (somebody accidentally swapped begin and end?).

Good idea, but what if it is intentional? I do not mean that we pass end() directly, but if we do a loop of find() functions where the beginning of the next range is always the successor of the last found element, we may result in a range of [end(), end()[, which I think is a valid empty range:

const auto start = v.begin();
while(true) {
   const auto item = find(start, v.end());
   if(item==v.end())
      break;
   doSomething(*item);
   start = ++item;
}

A list of ideas on improving core experience, mostly for myself because i seem to be the only one with strong opinions on this:

  • Provide a checker callback for structure copies, which would unify the multitude of similar callbacks in this checker;

A callback? Or just move the copy into a simple (or template?) function?

  • Consider removing the conjured structure symbol hack.

Which hack do you mean here? In evalCall() of the various std functions? As I mentioned, they can be removed, but then we will get more states in the exploded graph.

Did i forget anything?

zaks.anna added inline comments.Nov 17 2016, 8:58 AM
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
424

You can test the inlining case by turning on inlining of containers. I think it's important to add a test since the logic is somewhat complicated and it's possible the analyzer will change the treatment of containers in the future. Here is the option. I'd just add another test case with that option enabled:

/// Returns whether or not methods of C++ container objects may be considered
/// for inlining.
///
/// This is controlled by the 'c++-container-inlining' config option, which
/// accepts the values "true" and "false".
bool mayInlineCXXContainerMethods();

Test updated to include test case where system headers are inlined.

baloghadamsoftware marked an inline comment as done.Nov 18 2016, 7:48 AM
test/Analysis/Inputs/system-header-simulator-for-iterators.h
62 ↗(On Diff #78527)

Maybe we should merge this file with the system-header-simulator-cxx.h? It already contains a vector type but no iterators.

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
205

I am not sure why I am handleing CXXOperatorCall here. Instead, I should handle every call, but only instance calls. For final solution would it not be better to make the checker explicitely metrialize a temporary object here instead of just creating it silently? Then my existing checker function would catch it.

NoQ accepted this revision.Nov 29 2016, 5:10 AM
NoQ added a reviewer: NoQ.
In D25660#590778, @NoQ wrote:
  • Agree on the evalAssume() implementation (i'm still not quite understanding what the problem is here, see the new inline comments);

I think it will be settled soon.

This part makes a lot of sense to me now, cool!

Hmm, so we model !($x) as $x == 0. That's tricky. Maybe we should also consider a test like if ( (i == v.end()) == true ); once it's done, we're be doing as good of a job as RangeConstraintManager does on numeric symbols, which would be great and not worth improving further.

  • We should probably not warn by default on unchecked std::find (see comments following the push_back(2016) example), unless some strong arguments against such code patterns are provided;

It is automatic. The role of evalCall is only to reduce the exploded graph. If I remove it, we get the same result (that is why we have a nonStdFind there, to check this case). but with far more states. Especially in case of vector, where the GNU implementation is quite complicated because of optimizations.

Yep, i agree that some kind of evalCall is useful. However, it's now causing more positives than it should, and i think this behavior needs to be eventually avoided, because false positives are very scary - eg. we should try to end up with one state instead of two. Because by splitting states, we declare the possibility of both branches, which in this case is not always correct.

  • A BugReporterVisitor should be added to report iterator state changes to the user across the diagnostic path;

I also thought of this. The question is where to start the chain.

At least, the very last state update to the region that failed (without copies) should be easy to support. Copies would be tricky - i'm thinking of tagging nodes where copies happened with special program point tags that help us understand which region was the source for the copy.

  • More checks could be implemented in this checker, eg. passing end() as first argument of std::find() is an instant bug (somebody accidentally swapped begin and end?).

Good idea, but what if it is intentional? I do not mean that we pass end() directly, but if we do a loop of find() functions where the beginning of the next range is always the successor of the last found element, we may result in a range of [end(), end()[, which I think is a valid empty range:

const auto start = v.begin();
while(true) {
   const auto item = find(start, v.end());
   if(item==v.end())
      break;
   doSomething(*item);
   start = ++item;
}

I misread the docs, sorry><

A list of ideas on improving core experience, mostly for myself because i seem to be the only one with strong opinions on this:

  • Provide a checker callback for structure copies, which would unify the multitude of similar callbacks in this checker;

A callback? Or just move the copy into a simple (or template?) function?

A callback would certainly be better, because it removes a lot of boilerplate from the checker (to subscribe to one callback instead of five would be great). But that's a future plan, not for this patch.

  • Consider removing the conjured structure symbol hack.

Which hack do you mean here? In evalCall() of the various std functions? As I mentioned, they can be removed, but then we will get more states in the exploded graph.

I've just made an attempt in D27202.


I think this is good to go as an alpha checker!
I'm still in favor of more comments in this code.
One more minor inline nit.

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
460

This produces a -Wparentheses warning, i think we should silence it by putting an extra () around operator = because the assignment is intentional here.

This revision is now accepted and ready to land.Nov 29 2016, 5:10 AM

It's awesome to see that all the major issues have been addressed. Thank you for working on this and diligently working through the code review!!!

I have a few minor comments below.

Could you add this example yours as a "no-warning" test case:
const auto start = v.begin();
while(true) {

const auto item = find(start, v.end());
if(item==v.end())
   break;
doSomething(*item);
start = ++item;

}

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
167

How about: "C++ STL Error" -> "Misuse of STL APIs"

387

Please, quote svn revision number instead of phabricator number.

396

You could simplify the code a bit by moving all these identifier lookups into a subrutine and/or just have a single statement guard checking f they have been initialized.

446

I agree with Artem that the future readers and maintainers of this code would greatly benefit if there were higher level comments explaining how this checker works. For example, here, we are saving the information about the comparison because iterators are value types...

722

Would isInStdNamespace() from BugReporterVisitor.cpp be useful here? It would be fine to add this API to the CheckerContext or some other place accessible from here and the BugReporter.

733

This could be useful for other checkers as well. Maybe refactor this out as part of a subsequent commit?

test/Analysis/Inputs/system-header-simulator-for-iterators.h
62 ↗(On Diff #78527)

Yes, we the headers are supposed to be reusable for different checkers!

test/Analysis/iterator-past-end.cpp
74

The error message is not very good for the find API cases. There is only a possibility of access past end. Also its much better to be explicit about what went wrong here - the user forgot to check the return value of find. We could say something like "The value returned from 'find' needs to be checked before it's accessed".

We'd need to implement a custom BugReporterVisitor that detects if the iterator is a return value from some method that needs checking. This can be & should be a separate patch.

Also, have you evaluated this on real codebases? What results do you see? Are there any false positives found? Are there any true positives found?

baloghadamsoftware edited edge metadata.
baloghadamsoftware marked 4 inline comments as done.

Minor corrections, comments, some new tests, test input headers merged.

baloghadamsoftware added a comment.EditedDec 8 2016, 2:05 AM

D27202 is now a dependency, but cannot add it.

lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
167

OK, I copied it from another checker :-)

zaks.anna added inline comments.Dec 8 2016, 9:10 AM
lib/StaticAnalyzer/Checkers/IteratorPastEndChecker.cpp
722

Is there a reason not to use isInStdNamespace() instead of the inTopLevelNamespace()? We can add the API to Checker Context.

NoQ added a comment.EditedDec 9 2016, 10:25 AM

A quick example of how a bug reporter visitor for this checker may look like - it needs to be expanded significantly but here's a start:

<== example of how it looks.

See, for example, MallocChecker to understand the rest of the bureaucracy around bug reporter visitors.

Thanks Artem!

Just to be clear, I think this patch should be committed once "inTopLevelNamespace" issue is addressed. That is the only issue pending as far as I can see.

The visitor should be a separate patch.

Now isInStdNamespace is used. Hack is back so D27202 is not a dependency now.

Also, have you evaluated this on real codebases? What results do you see? Are there any false positives found? Are there any true positives found?

I am doing it right now. Unfortunately I found a crash which I fixed, but then it turned out that overwrites of the iterator variable are not handled. I am working on this problem.

I am doing it right now. Unfortunately I found a crash which I fixed,

Is it fixed in this patch?

but then it turned out that overwrites of the iterator variable are not handled. I am working on this
problem.

My suggestion is to commit this patch and address the iterator variable overwrites separately, so that it would be more incremental and easier to review. Does this sound good to you?

zaks.anna accepted this revision.Dec 16 2016, 9:41 AM
zaks.anna edited edge metadata.

And thank you for the awesome work and addressing the review comments!!!

This revision was automatically updated to reflect the committed changes.