This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Iterator Checker - Part 9: Evaluation of std::find-like calls
AbandonedPublic

Authored by baloghadamsoftware on May 5 2017, 6:23 AM.

Details

Summary

This patch adds explicit evaluation of the following functions: std::find, std::find_end, std::find_first_of, std::find_if, std::find_if_not, std::lower_bound, std::upper_bound, std::search and std::search_n. On the one hand this is an optimization since the result of each of these functions is an iterators either inside the range or the iterator past the end of the range. This evaluation does exactly this. (We cannot simulate "inside the range" just an iterator bound to the same container with a new offset). On the other hand this evaluation is needed for random access operators since some STL implementations do some optimization where the length of the range is used instead of iterating until the end. This is hard to track in a checker so we must do this evaluation to prevent false positives and catch more real bugs.

Diff Detail

Event Timeline

NoQ added inline comments.Dec 14 2017, 4:03 PM
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
1584–1588

We discussed this in D25660 but i couldn't find what we decided upon. It seems scary to me that in every situation, we declare that it is possible that the element is not in the container. It should be allowed, even if not necessarily efficient, to add the element to the container, and then use std::find to obtain an iterator to it. So i think that this state split, even if generally approved by your users, may need to stay under an analyzer checker option.

george.karpenkov resigned from this revision.May 30 2018, 4:51 PM
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
1584–1588

Maybe instead of an option, I should put it into a separate checker that does not emit warnings just simulates these functions, like you did it in rC284960. The file name could be StdCppLibraryFunctions.cpp. Common functions and macros should then be moved into a header that both StdLibraryFunctions.cpp and StdCppLibraryFunctions.cpp includes.

NoQ added inline comments.Oct 14 2018, 8:48 PM
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
1584–1588

Right now i don't think we're ready for finding a good generalized way of describing C++ method summaries. Even the mechanism in the C library checker is veeeery limited. In C++ i'd put a good generalized model for dealing with opaque symbolic structures as an important pre-requisite. But if you have anything specific in mind, i'd be excited to hear what you propose.

Also i still believe that state split is incorrect and leads to inevitable false positives even when all the information that's necessary to prevent these false positives is in fact available to the Analyzer. And for that reason i hesitate to put this logic into a checker that most people will need to enable by default.

On the other hand, because you support == and !=, these false positives might be suppress-able with asserts. So if they're rare, they're probably not that bad. But i'd still put it under a flag.

Actually, yeah, emptiness of the container might be pretty easy to model. It'll let you return .end() when .begin() is called, etc. You can declare that the container is empty when it's freshly default-constructed or a fresh return value of .empty() is assumed to be true. That's not enough to fix .find() modeling, just random thoughts.

NoQ added a comment.Oct 14 2018, 9:05 PM

Unfortunately, we are at the beginning of a long road. I will post several new patches that we already test internally. However the only checker with acceptable false-positive rate is the invalidated-iterator checker. The mismatched-iterator still has high false-positive rate for iterator-iterator mismatches. For iterator-container mismatches it is acceptable. The iterator-range checker also has still too many false positives.

These 1.5 checkers are a lot, in my opinion. They're preventing undefined behavior that can be very easy to introduce accidentally. I think they're very valuable. Non-powerusers would love to have these checkers turned on by default, and while this is entirely up to you to decide, my vague idea of the greater good suggests that focusing on converging to something not-feature-rich-but-deliverable and delivering that something to the users is a good thing to do, especially given that it seems almost ready, while there's a long road ahead to finish other checkers.

Like, if you have 3 tasks and spend 1/3 of your time on each of them, all of them will be done in 3 units of time, but if you first spend 1 unit of time on the first task, then 1 unit of time on the second task, then 1 unit of time on the third task, then even though time to complete all three tasks remains at 3 units you have the benefit of the first and the second task being completed and starting to bring benefit much earlier.

baloghadamsoftware abandoned this revision.EditedNov 29 2019, 2:08 AM

The STL find() family is modeled in a separate checker: Model STL Algoirthms to improve the iterator checkers