This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] [WIP] Standard C++ library functions checker for the std::find() family
AbandonedPublic

Authored by baloghadamsoftware on Nov 6 2018, 6:00 AM.

Details

Reviewers
NoQ
Summary

Simulate the std::find()-like functions based on function summaries similar to the standard C library functions checker. The summary is very simple here: the item(s) looked for can either be found in the container or not. In this latter case these functions return the iterator denoting the end of the range.

Without this checker the std::find()-like functions are either inlined or not but it does not help: their implementation is too complex for the Static Analyzer to track them successfully. This implies that based on their real implementations the iterator range checker cannot find cases where the programmer forgot to check whether the element was found in the container, the end of the range was the end() of the container and a dereference is made on the result.

This solution is still work in progress. The macro language describing summaries cannot deal with complex types yet. We need a way to describe parameters of template functions instead of using Irrelevant for all non-integer arguments. For example, SAME_AS(arg_num) could be used to describe that two arguments have the same type.

ComparesToArgument does not work for complex types so we needed a new constraint type BoundToArgument.

The stand-alone test does not work since the analyzer cannot handler the equality of variables with complex types. However the new tests in the iterator range checker pass if this checker is also enabled.

Diff Detail

Event Timeline

baloghadamsoftware edited the summary of this revision. (Show Details)Nov 6 2018, 6:14 AM
NoQ added a comment.Nov 30 2018, 3:32 PM

Such checker is useful. It'd be great to add a lot of "pure" functions into it, simply for the sake of pointing out that they have no side effects, i.e. avoid unnecessary invalidation.

The bound-to-argument constraint is useful. It avoids the alpha-renaming problem by assigning the correct value right away instead of stuffing an aliasing condition into the solver.

I still believe that the idea to split the state on std::find is questionable, no matter how exactly it's implemented. std::find does more than tells whether the element is in the container, so calling it just for its other effect (to obtain the iterator to the element) when you already know that the element is in the container is not an indication of a dead code, even if it is a bad code smell in most cases. I think it is very likely that people will come to us with false positives of that kind and we'll have to revert this behavior. Would you be OK to keep support for std::find() under an option, off by default but available when the user opts into lint checks? This could be implemented by making a new check kind (eg., CK_OptimisticStdCXXLibraryFunctionsChecker).

Random thought: i still think that implementing the "object value id" thing (a symbol-like id that is associated with the memory region of the object and gets invalidated when the object is logically mutated but gets copied as-is to the new object upon copy/move-constructors or assignments) is a good way to refactor all these things. But this work seems orthogonal to what you did here (i.e., we'll have to perform the copy of the "id" here, but that's just a separate task). So this patch shouldn't be blocked on that.

Another random thought: it'd be great to make a bug reporter visitor that highlights state splits produced by this checker. Eg., in your case it would produce an event note that says something like "Assuming object is not present in iterator range". Otherwise the user would be unable to understand why do you think that the iterator is bad. Or for, say, ispunct() it would say "Assuming character is punctuation symbol".

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
159–161

I think it's important to document that bound-to-argument should always be used instead of compares-to-argument when the comparison operator is == and the constraint is applied to return values of fully evaluated functions.

375–376

Hmmmmmmmm.

std::find is a function that returns a C++ object by value.

For such functions it is not enough to bind the returned prvalue of the object. It is important to realize that while it is an equal iterator, it's not the same object. Which is why it is necessary to foresee the storage for the newly constructed iterator, so that destruction/materialization/copy elision worked correctly. See the respective logic in ExprEngine::bindReturnValue():

611   if (auto RTC = getCurrentCFGElement().getAs<CFGCXXRecordTypedCall>()) {
612     // Conjure a temporary if the function returns an object by value.
613     SVal Target;
614     assert(RTC->getStmt() == Call.getOriginExpr());
615     EvalCallOptions CallOpts; // FIXME: We won't really need those.
616     std::tie(State, Target) =
617         prepareForObjectConstruction(Call.getOriginExpr(), State, LCtx,
618                                      RTC->getConstructionContext(), CallOpts);
// ...
627   } // ...

It's bad that checkers currently need to do that manually when they implement evalCall() for C++ functions that return objects. We need to come up with a better API, eg. State->BindReturnValue(same arguments) that prepares for object construction (i.e., move the part of the functionality of ExprEngine::bindReturnValue() that's responsible for binding into a ProgramState method, and only keep the functionality that's responsible for conjuring the value).

test/Analysis/std-cxx-library-functions.cpp
7–11

I think that's a great standalone test.

The modeling framework in its current state cannot be used for detailed modeling of complex functions. Even detailed modeling (including handling of GDM data for checkers) of standard C functions are done manually: see C string and stream checkers. Template functions of the C++ STL are more complex, creating a generic modeling framework would be huge multi-year project for which we currently lack the resources. So we decided to model these functions manually: Model STL Algoirthms to improve the iterator checkers