This is the first part of the new Iterator Checker. This part contains the very core infrastructure. It only checks for out-of-range iterators in a very simple case.
Details
- Reviewers
zaks.anna NoQ - Commits
- rG8fa639ecf06b: [analyzer] Initial commit for the upcoming refactoring of the IteratorChecker.
rC304160: [analyzer] Initial commit for the upcoming refactoring of the IteratorChecker.
rL304160: [analyzer] Initial commit for the upcoming refactoring of the IteratorChecker.
Diff Detail
Event Timeline
Thank you very much!
I think you might have forgotten to include the newly added files in this review(?)><
I see the checker going towards what we call "metadata symbols", which seems logical.
I still feel the checker deserves a lot more comments around the code. In particular, i'd definitely love to see a human-readable explanation of what kinds of iterators are supported (from plain pointers or integers to opaque structures): you did great job abstracting away these differences in the high-level logic, but i'd like to demonstrate how does the low-level logic (eg. offsets and comparisons, i.e. these moments when you're operating on various SVal objects directly) works in all these cases, because this part of the checker requires the most ingenuity to handle, while the rest is more or less straightforward.
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp | ||
---|---|---|
211–214 | These functions are currently unused; my compiler warns about that, and buildbots would probably yell at us when we commit it, so i guess it's better to add them when they are actually used; they'd also be tested as well. | |
219 | Before we forget: Range -> range. As we've noticed recently in D32702 :), we don't capitalize every word in bug types and categories. | |
294–297 | I guess this deserves a test case (we could split this out as a separate feature as well). I'm also afraid that we can encounter false positives on functions that are not STL algorithms. I suggest doing this by default only for STL functions (or maybe for other specific classes of functions for which we know it works this way) and do this for other functions under a checker option (i.e. something like -analyzer-config IteratorChecker:AggressiveAssumptions=true, similar to MallocChecker's "Optimistic" option). | |
323–324 | This callback is currently untested as well. I'm doing this bit of manual "mutation testing" by removing pieces of code and re-running tests because not only we'd rather keep every commit self-sufficient, but also we'd rather make sure we didn't forget anything, which is much easier to do patch-by-patch. | |
478–479 | I see what you did here! And i suggest considering SymbolMetadata here instead of SymbolConjured, because it was designed for this purpose: the checker for some reason knows there's a special property of an object he wants to track, the checker doesn't have a ready-made symbolic value to represent this property (because the engine didn't know this property even exists, so it didn't denote its value), so the checker comes up with its own notation. SymbolMetadata is used, for example, in CStringChecker to denote an unknown string length for a given null-terminated string region. This symbol carries the connection to the region (you may use it to reverse-lookup the region if necessary), and in particular it is considered to be live as long as the base region is live (so you don't have to deal with liveness manually; see also comments around SymbolReaper::MarkInUse). It's also easier to debug, because we can track the symbol back to the checker. Generally, symbol class hierarchy is cool because we know a lot about the symbol by just looking at it, and SymbolConjured is a sad fallback we use when we didn't care or manage to come up with a better symbol. I'm not sure if LongTy is superior to IntTy here, since we don't know what to expect from the container anyway. Also, please de-duplicate the symbol creation code. Birth of a symbol is something so rare it deserves a drum roll :) I'd take a pause to figure out if the same logic should be applied to the map from containers to end-iterators. | |
513 | One more unused function. | |
515–516 | One more unused function. |
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp | ||
---|---|---|
211–214 | Sorry, I forgot to delete it. I will do it. | |
219 | Agree. | |
294–297 | I will check whether this piece of code could be moved in a later part of the checker. However, I suggest to first wait for the first false positives before we introduce such an option. This far the false positives in my initial tests had different reasons, not this one. | |
478–479 | SymbolMetaData is bound to a MemRegion. Iterators are sometimes symbols and sometimes memory regions, this was one of the first lessons I learned from my first iterator checker. | |
513 | OK. | |
515–516 | I thought I removed it. |
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp | ||
---|---|---|
478–479 | Oh. Hmm. Ok. Right. To be sure: in what cases do you need to create a new symbol when the iterator is already a symbol? How broken do we become if we try to say that the symbolic iterator and its own offset are the same thing? |
Noticed a few more things.
It sounds as if once this first patch lands, the rest should be easy :)
Regarding the comments in the code. I materialized my wishes to something like:
- At the top of the file:
// In the code, iterator can be represented as a: // * type-I: typedef-ed pointer. Operations over such iterator, such as comparisons or increments, are modeled straightforwardly by the analyzer. // * type-II: structure with its method bodies available. Operations over such iterator are inlined by the analyzer, and results of modeling these operations are exposing implementation details of the iterators, which is not necessarily helping. // * type-III: completely opaque structure. Operations over such iterator are modeled conservatively, producing conjured symbols everywhere. // // Additionally, depending on the circumstances, operators of types II and III can be represented as: // * type-IIa, type-IIIa: conjured structure symbols - when returned by value from conservatively evaluated methods such as `.begin()`. // * type-IIb, type-IIIb: memory regions of iterator-typed objects, such as variables or temporaries, when the iterator object is currently treated as an lvalue. // * type-IIc, type-IIIc: compound values of iterator-typed objects, when the iterator object is treated as an rvalue taken of a particular lvalue, eg. a copy of "type-a" iterator object, or an iterator that existed before the analysis has started.
Not sure if type-IIa iterators actually make sense. It's ok if you come up with your own classification :)
Then, in methods that deal with iterator SVals directly, i wish we had hints explaining what's going on in these ~7 cases. In my opinion, that'd greatly help people understand the code later, and it'd help us understand how to avoid this variety and provide checker authors with a better API as soon as we get to this, so it's the biggest concern for me about this checker.
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp | ||
---|---|---|
16 | This header seems unused for now. | |
22 | This header seems unused for now. | |
153–155 | Carryover from the other review: did you try using RegionOrSymbol as a key and keep only one map? | |
166–167 | I've a bit of doubt about those. Would they call their constructors every time clang starts, regardless of whether the analyzer or the checker is enabled? Maybe having them as private variables inside the checker class would be better? As in http://llvm.org/docs/CodingStandards.html#do-not-use-static-constructors | |
294–297 | Unfortunately, we've had a poor experience with this approach in other checkers. You never know, and it seems that it's always better to have a safe fallback mode available under a flag, because if a few classes of false positives are found, and we are forced to reduce the checker to a safer behavior, it'd be hard to remember all the places where unsafe heuristics were used. | |
478–479 | All right, i guess it wasn't a great idea. Even if iterators are plain pointers, the offset is a pointer difference. I still have this feeling that the analyzer is not good enough for this checker yet, so at least it's great that we have it moving. I once wished we had "metadata regions" for symbols and regions, and then automatically gaining symbols for these region values (http://lists.llvm.org/pipermail/cfe-dev/2016-May/049000.html), that would have made things a lot cooler maybe. Never mind. | |
test/Analysis/iterator-range.cpp | ||
1–2 | Could we add run-lines without -analyzer-eagerly-assume? Currently all variants are passing, but if new tests will fail, we could #ifndef them out. | |
9 | I'd suggest sticking clang_analyzer_warnIfReached() here to demonstrate that there is no warning because the branch is dead. That'd make the test stronger. |
I am not sure which method do you mean? I think here the crucial functions are the setIteratorPosition() and the getIteratorPosition() functions which provide a common way to handle all these SVals.
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp | ||
---|---|---|
166–167 | Not needed in this patch. I delete them from here. | |
219 | Also "out of" instead of "of out" :-) | |
294–297 | I think this heuristic is well marked by the comment, easy to find if it causes false positives. When I started working on Clang (Tidy first) reviewers discouraged me to add options before experiencing false positives. | |
323–324 | OK, removed from the first patch. | |
test/Analysis/iterator-range.cpp | ||
1–2 | In my first checker Anna suggested to always use this option. She also wrote that she plans to remove possibility to execute the Analyzer without eagerly assume. |
Sorry, got carried away by GSoC and critical stuff...
Hmm, i think i mostly mean evalAssume() and the comparisons mechanism. You're having multiple cases, like:
(1) if the assumed value is a BinarySymExpr for which there's a stored comparison (good for type-II iterators),
(2) if it's a symbol that was conjured by a conservatively evaluated comparison operator (here you lookup the opcode from the referenced expression) for which there's a stored comparison (good for type-III iterators),
(3) if it's a binary symbolic expression of form ($x == 0) or ($x != 0) where $x is one of (1) or (2).
I believe that the whole idea behind storing comparison results should have high-level comments explaining how it works (being tricky but solid - essentially it extends the constraint manager, taking over when handling iterator constraints, which sounds like the right thing to do).
I think we should land after that; i only have comment-related comments now :)
lib/StaticAnalyzer/Checkers/IteratorChecker.cpp | ||
---|---|---|
132 | Strcutre -> Structure :) | |
294–297 | Well, it's a bit sad to say, but that's one of the things that we didn't agree upon with clang-tidy, i guess; historically, they started as a lint-like tool that produces nice-to-fix-anyway kinds of warnings (having nice bug-finding checks nowadays), and we always kept positioning ourselves as a "quiet" tool that reports only real critical bugs for users that never would be annoyed immensely by every single false positive (doesn't make this true though), and it makes us nervous about every single potential false positive class. I guess we could leave that as a task for later, with a "FIXME: Add a more conservative mode". | |
test/Analysis/iterator-range.cpp | ||
1–2 | Ouch, right. Suffering from short-term memory loss :) |
I believe we can move on to the next one :) Just hope we didn't screw up the rebase too much here. Thanks again!
This header seems unused for now.