This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/StaticAnalyzer/Checkers/
-
StaticAnalyzer/
-
Checkers/
14
IteratorChecker.cpp

Differential D53701

[Analyzer] Instead of recording comparisons in interator checkers do an eager state split
ClosedPublic

Authored by baloghadamsoftware on Oct 25 2018, 7:26 AM.

Download Raw Diff

Details

Reviewers

NoQ
Szelethus

Commits

rG54976e76e672: [Analyzer] Instead of recording comparisons in interator checkers do an eager…
rL358951: [Analyzer] Instead of recording comparisons in interator checkers do an eager…
rC358951: [Analyzer] Instead of recording comparisons in interator checkers do an eager…

Summary

Currently iterator checkers record comparison of iterator positions and process them for keeping track the distance between them (e.g. whether a position is the same as the end position). However this makes some processing unnecessarily complex and it is not needed at all: we only need to keep track between the abstract symbols stored in these iterator positions. This patch changes this and opens the path to comparisons to the begin() and end() symbols between the container (e.g. size, emptiness) which are stored as symbols, not iterator positions. The functionality of the checker is unchanged.

Diff Detail

Repository: rC Clang

Event Timeline

baloghadamsoftware created this revision.Oct 25 2018, 7:26 AM

Herald added a reviewer: george.karpenkov. · View Herald TranscriptOct 25 2018, 7:26 AM

Herald added subscribers: donat.nagy, Szelethus, mikhail.ramalho and 4 others. · View Herald Transcript

Makes sense!

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
38	Did you mean lazy compound values?
172–173	typo: for
299–300	I think this name would be better if you added the new state to `CheckerContext` within this function (and making it `void`), or rename it to `getEvaluatedComparisonState`.
339	I think `evaluateComparison` would be a more fitting name.
1140–1141	You will have to excuse me for commenting on something totally unrelated, but I'm afraid this may cause a crash, if the region returned by `getSuperRegion` is symbolic. I encountered this error when doing the exact same thing in my checker: D50892. Can something like this occur with this checker?
1990–1991	It isn't obvious to me (at first) what happens here -- maybe document when this function will return with `nullptr`? When `relateSymbol` is called and checked whether the returned value is null or not, one could think that this symbolizes some sort of failure.

Ok, so what this code does is, eg., for a call of i1.operator==(i2) that returns a symbol $x of type bool, it conserves the sacred checker-specific knowledge that $x is in fact equal to $offset(i1) == $offset(i2), where $offset(i) is the offset symbol within IteratorPosition to which iterator i is mapped.

This looks like a renaming problem to me: what we really want to do is rename (i.e. update representation in the SVal hierarchy of) $x to $offset(i1) == $offset(i2). And for now the single plausible approach to renaming problems in the Static Analyzer is to avoid them: give the value a correct name from the start so that you didn't have to rename it later.

What this means is that instead of waiting until checkPostCall to obtain $x and then trying to rename it into $offset(i1) == $offset(i2), we should evalCall the comparison operator to return $offset(i1) == $offset(i2). So that the symbol with the wrong name (i.e., $x) didn't appear in the program state in the first place.

The good thing about this approach is that it does not require any extra tracking at all - no comparison maps, no evalAssume(), nothing. The value is simply correct from the start.

As a side effect, you will have a chance (though still not forced) to avoid redundant invalidation of Store during evaluation of the operator call. This is the correct behavior for at least STL containers and probably for all containers ever created by mankind. Though of course you never know. I.e., what if the code under analysis measures complexity of vector sort procedure and increments a global counter every time two iterators are compared within that procedure? But at least for STL/boost/LLVM/WebKit containers this is probably the right thing to do.

Now, of course, evalCall() would suppress inlining. During evalCall() we currently do not know whether the function will be inlined or evaluated conservatively if none of the checkers evaluate it, but we can easily provide such information in evalCall(), so this is not a problem.

The problem with inlining is that we got names for iterator offsets wrong from the start, because we conjured them out of thin air and they are conflicting with the actual representation of offsets within the implementation of the container. Which brings us back to a renaming problem: the problem of renaming $offset(i1) == $offset(i2) into the actual return value $x that was computed via inlining. Moreover, this new renaming problem is ill-formed because renaming non-atomic symbols doesn't make any sense - we should instead rename $offset(i1) and $offset(i2) separately. Still, the problem is indeed, as you already noticed, solved easily when $x is concrete: we only need to assume that $offset(i1) == $offset(i2) or $offset(i1) != $offset(i2) depending on the concrete value of $x.

And if $x is symbolic, we could still introduce a state split here: on one branch both $x and $offset(i1) == $offset(i2) are false, on the other branch they both are true, and no additional tracking is ever be necessary. I believe that such state split is not invalid: it pretty much corresponds to the "eagerly assume" behavior, just for iterators. The question here is how much are we sure that both branches are possible. Even if neither inlining nor our custom iterator model managed to refute any of these two branches, one of the paths may still be infeasible. But the amount of error here is not larger than eagerly-assume, and for eagerly-assume it isn't that bad, so we could try. Of course, the alternative to state split is assuming things about ($offset(i1) == $offset(i2)) == $x, but our constraint solver will not be able to handle such constraints, which is the very reason why we have problems with renaming (well, at least some of them; renaming temporary regions in C++ is slightly more complicated than that (understatement intended)). In fact, i think the reasoning behind having eager assumptions for numbers is exactly that: they wanted to make constraints simpler.

But still, i want to step back and ask the question that i really really want answered here: if container methods are inlinable, do we actually want to track iterator positions manually? Maybe just rely on inlining entirely when possible? Like, both for modeling and for bug-finding. Or only rely on evalCall()? Essentially, if inlining is not reliable enough for relying on it entirely (so we have to maintain a parallel checker-specific model of iterators and have these two models exchange information), why do we think that it is reliable enough for the purpose of evaluating iterator comparisons?

This question is in fact asked and answered, one way or another, intentionally or accidentally, with different levels of urgency, in every checker that tries to model effects of function calls. The most famous example of conscious approach to this problem (overstatement intended) so far is RetainCountChecker that has thousands of lines of code devoted solely to figuring out whether a function should be evaluated by the checker or inlined. Once the checker decides to rely on inlining while *also* modeling its own specific effects - inlining starts conflicting with modeling and the hell breaks loose.

So i believe it is very important to ask ourselves here: do we really want to mix our own symbolic model of iterators with the implicit model of iterators as plain structures in the Store that is automatically managed via inlining? Because if only we decide to either trust inlining entirely or not trust it at all, things become *much* easier. I expect that the amount of effort to keep these two models consistent with each other will explode very quickly as we add more features to the checker.

Now, the interesting part about keeping these two models consistent in this checker is that it is entirely a "constraint-like" problem: our effort consists entirely of adding new information about immutable entities (symbols) to the state. The state doesn't mutate - it is only being clarified. And when the state is actually mutated (eg., when we are modeling operator++()), no effort to maintain consistency is needed. This is very different from the problem we're having in RetainCountChecker, where it is a "store-like" problem, i.e. function calls we're trying to model are actively mutating the state that might have already been mutated within the inlined call, i.e. they're competing for the same data. And i believe that the difference is entirely in implementation details of these two checkers: there isn't that much difference in kinds of bugs they find, so i guess it's an interesting food for thought.

So, to summarize:

The eager state split solution is not all that bad here, and is also much easier to implement than the delayed state split you're trying to implement in this patch. Generally, any information produced by inlining is most likely pretty accurate and should, in a perfect world, be incorporated into the checker's state, given that the checker already decided to track some sort of internal information.
Still, i want to give you a heads up that the idea of maintaining an implicit (via checker traits and evalCall()) and an explicit (via Store and inlining) models of iterators simultaneously may result in very quick explosion in complexity. We already went into this direction because we didn't think about it this way, but i think it's never too late to reconsider.

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
1999	P.S. What was the idea here? Like, `CompSym` was just computed via `BO_EQ` and has type of a condition, i.e. `bool` because we are in C++. Is this code trying to say that the result of the comparison is bounded by `true/2`?

Your suggestion sounds completely reasonable. We cannot rely entirely on inlining of comparison operators. However, there is an important side-effect of inlining: on iterator-adapters inlining ensures that we handle the comparison of the underlying iterator correctly. Without inlining, evalCall() only works on the outermost iterator which is not always correct: it may happen that the checker cannot conclude any useful relation between the two iterator-adapters but there are some relations stored about the inner iterators. Based on my experiments quite many of the false-positives are related to iterator-adapters. So I am afraid that we introduce even more false-positives by losing inlining.

I wonder whether the current "mixed" approach introduces additional paths because we do not do explicit state-splits and function processComparison() removes contradicting branches.

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
1999	There is also a `->getLHS()` which means that we enforce the bound on the left-hand side of the rearranged comparison. Although both symbols are bounded by `max/4`, constraint manager does not imply that the sum/diff is the bounded by `max/2`. I have to enforce this manually to prevent `min` negated to `min` when the constraint manager negates the difference.

In D53701#1287007, @baloghadamsoftware wrote:

...on iterator-adapters inlining ensures that we handle the comparison of the underlying iterator correctly. Without inlining, evalCall() only works on the outermost iterator which is not always correct: it may happen that the checker cannot conclude any useful relation between the two iterator-adapters but there are some relations stored about the inner iterators. Based on my experiments quite many of the false-positives are related to iterator-adapters. So I am afraid that we introduce even more false-positives by losing inlining.

Mmm, is it possible to detect adapters and inline them as an exception from the rule? You can foresee a pretty complicated system of rules and exceptions if we go down this path, but i believe that it is still much easier and more reliable than the system that tries to synchronize two different models of iterators, so i really encourage you to at least give it a try somehow.

I wonder whether the current "mixed" approach introduces additional paths because we do not do explicit state-splits and function processComparison() removes contradicting branches.

I believe that the effect of eager state splits is going to be roughly similar to the actual -analyzer-eagerly-assume:

Split paths earlier than it is absolutely necessary, which may slow down the analysis and duplicate some work, but most of the time it'll be just a few nodes before the actual check in the code would have caused a split anyway.
Simplify constraint solving on each of the new paths, which will indeed help with refuting some invalid paths, especially those in which new constraints over the symbols are introduced after the check, but that's due to pecularities of our constraint solver, i.e. rather accidentally than intentionally.

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
1140–1141	Hmm, had a look at the crash you mention here. Your code crashed because you additionally did `getAs<TypedValueRegion>`, which would turn your pointer into a null when a symbolic region is encountered. So the final code ended up a bit weird: there's no need to check that it's a `TypedValueRegion` before you check that it's a `CXXBaseObjectRegion`; just check for the latter directly. This code looks correct in this sense. Also, since this code keeps copied around, do we need a better helper function for this unwrap idiom? I.e., something like `MemRegion::StripCasts()` that only unwraps derived-to-base casts?
1999	Ouch, right, didn't notice `getLHS()`, sorry!

In D53701#1288258, @NoQ wrote:

Mmm, is it possible to detect adapters and inline them as an exception from the rule? You can foresee a pretty complicated system of rules and exceptions if we go down this path, but i believe that it is still much easier and more reliable than the system that tries to synchronize two different models of iterators, so i really encourage you to at least give it a try somehow.

I am almost sure it is implossible. It is not only about iterator-adapters in the classical sense but also any iterator that internally deals with other iterators.

For example I created an iterator in the past that iterates over the elements of a list of lists. This iterator contained two iterators, one for the outer and one for the inner list. In this case determining whether two iterators are the same is non-trivial: for all in-range iterators both the outer and the inner iterators must be the same. However if the outer iterator is the past-end iterator of the outer list then the inner iterator is irrelevant. Thus the comparison operator of such iterator must check first whether any of the outer iterators is the past-end iterator and only compare the inner iterator none of them is. If both outer iterators are the past-end iterator then they are equal regardless of the inner iterators.

Another example could be an iterator that somehow merges two lists internally using two different iterators. In this case only one of the inner iterators is relevant when comparing two merging iterators.

So by dropping the inlining we lose some intrinsic knowledge of the analyzed code which leads the checker to wrong assumptions.

I wonder whether the current "mixed" approach introduces additional paths because we do not do explicit state-splits and function processComparison() removes contradicting branches.

I believe that the effect of eager state splits is going to be roughly similar to the actual -analyzer-eagerly-assume:

Split paths earlier than it is absolutely necessary, which may slow down the analysis and duplicate some work, but most of the time it'll be just a few nodes before the actual check in the code would have caused a split anyway.

Simplify constraint solving on each of the new paths, which will indeed help with refuting some invalid paths, especially those in which new constraints over the symbols are introduced after the check, but that's due to pecularities of our constraint solver, i.e. rather accidentally than intentionally.

When I first started working on Clang Static Analyzer Anna told me the -analyzer-eagerly-assume should be the default. In the iterator checkers the refutation happens intentionally in the functions relateSymbols() and processComparison().

In D53701#1288682, @baloghadamsoftware wrote:

I am almost sure it is implossible. It is not only about iterator-adapters in the classical sense but also any iterator that internally deals with other iterators.

For example I created an iterator in the past that iterates over the elements of a list of lists. This iterator contained two iterators, one for the outer and one for the inner list. In this case determining whether two iterators are the same is non-trivial: for all in-range iterators both the outer and the inner iterators must be the same. However if the outer iterator is the past-end iterator of the outer list then the inner iterator is irrelevant. Thus the comparison operator of such iterator must check first whether any of the outer iterators is the past-end iterator and only compare the inner iterator none of them is. If both outer iterators are the past-end iterator then they are equal regardless of the inner iterators.

Another example could be an iterator that somehow merges two lists internally using two different iterators. In this case only one of the inner iterators is relevant when comparing two merging iterators.

So by dropping the inlining we lose some intrinsic knowledge of the analyzed code which leads the checker to wrong assumptions.

The nested iterator thing looks easy to detect heuristically. Just have a look if any of the fields within the object are of iterator type.

I think it's worth a try to do a total evalCall at first, and then disable evalCall (together with the attempt to model the iterator position) in an incrementally growing blacklist of cases (1. iterator adaptors, 2. ....) as we encounter problems. This essentially becomes part of the logic that decides whether an object is an iterator. Eg., if it's more like an adaptor than an actual iterator, let's treat it as if it's not an iterator, but inline instead, and hope that we model the underlying iterators correctly via evalCall.

This does look hacky, but it does look less hacky than trying to align two models of the iterator position. Or is it actually necessary to maintain the two models of the iterator position in order to avoid these false positives? If so, could you give an example?

When I first started working on Clang Static Analyzer Anna told me the -analyzer-eagerly-assume should be the default.

Which is why i suggest a similar behavior here.

Herald added a subscriber: gamesh411. · View Herald TranscriptNov 30 2018, 2:03 PM

In D53701#1315242, @NoQ wrote:

I think it's worth a try to do a total evalCall at first, and then disable evalCall (together with the attempt to model the iterator position) in an incrementally growing blacklist of cases (1. iterator adaptors, 2. ....) as we encounter problems. This essentially becomes part of the logic that decides whether an object is an iterator. Eg., if it's more like an adaptor than an actual iterator, let's treat it as if it's not an iterator, but inline instead, and hope that we model the underlying iterators correctly via evalCall.

I think that only tracking the inner iterator and treating the outer iterator as a non-iterator is a nightmare from the user's perspective: all detections seem to be "internal" errors of the underlying API and thus regarded as "probably false positives". When using iterator adapters of the STL the bugs may also be filtered out by the analyzer if this option is enabled. The user must see the errors on the topmost level whenever possible to understand them.

In D53701#1318566, @baloghadamsoftware wrote:

In D53701#1315242, @NoQ wrote:

When using iterator adapters of the STL the bugs may also be filtered out by the analyzer if this option is enabled.

Mmm, are you talking about c++-container-inlining? This option currently prevents inlining of container methods. STL iterators shouldn't be affected, and even if they were affected, they will simply fall back to conservative evaluation.

On the other hand, this is how i *want* this option to work. I.e., instead of suppressing inlining, it should suppress bugs via visitors when an interesting event happen within a container, where "interesting" is potentially checker-specific and defined in every visitor separately. I believe that most checkers will be unaffected. Additionally, it still doesn't affect iterators.

I think that only tracking the inner iterator and treating the outer iterator as a non-iterator is a nightmare from the user's perspective: all detections seem to be "internal" errors of the underlying API and thus regarded as "probably false positives". (...) The user must see the errors on the topmost level whenever possible to understand them.

Well, i mean, whenever you are inlining a method, you are exposing details of the "internals" of the inlined API to the user. The only difference is whether this API itself deals with iterators. This sounds to me as if we try not to inline iterator methods in general. Or try really hard to prevent desynchronization between the two models.

Ok, how about an eager state split for now?

MTC added a subscriber: MTC.Dec 6 2018, 6:22 PM

In D53701#1322255, @NoQ wrote:

Well, i mean, whenever you are inlining a method, you are exposing details of the "internals" of the inlined API to the user. The only difference is whether this API itself deals with iterators. This sounds to me as if we try not to inline iterator methods in general. Or try really hard to prevent desynchronization between the two models.

When tracking multi-level iterator structures (iterator adapters or other constructs e.g. a leaf-iterator in a general tree) we mainly want to track the topmost level, but it is inaccurate without inlining their methods. We need the to track the real control flow. However, when reporting a bug we want to report it at the topmost level since the erroneous code is most probably the top-level code, not the API. That way we do not directly expose the layers below the API to the user. That is the reason that we have to handle all structures which behave as iterators as iterators, even if they contain iterators themselves. Since error-checking happens in the checkPre...() hooks, bugs are automatically catched at the topmost level.

Ok, how about an eager state split for now?

Do you mean that upon iterator comparison I do not record the comparison and later retrieve it in the branches with concrete results but I do the state-split myself in checkPostCall()?

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 20 2019, 5:32 AM

Instead of recording comparisons do an eager state split if the result is a symbolic value.

baloghadamsoftware retitled this revision from [Analyzer] Record and process comparison of symbols instead of iterator positions in interator checkers to [Analyzer] Instead of recording comparisons in interator checkers do an eager state split.Feb 20 2019, 10:12 AM

baloghadamsoftware removed a reviewer: george.karpenkov.

baloghadamsoftware removed a subscriber: donat.nagy.

processComparison() refactored.

Szelethus added a reviewer: Szelethus.Mar 14 2019, 7:44 AM

Herald added a subscriber: Charusso. · View Herald TranscriptMar 14 2019, 7:44 AM

Ping!

NoQ added inline comments.Mar 25 2019, 3:02 PM

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
823–832	Welcome to the `addTransition()` hell! Each of the `assignToContainer()` may add a transition, and then `processComparison()` also adds transitions. I suspect that it may lead to more state splits that were intended. I.e., the execution path on which the iterator is assigned to a container would be different from the two execution paths on which the comparison was processed. You can chain `addTransition()`s to each other, eg.: // Return the node produced by the inner addTransition() ExplodedNode N = assignToContainer(...); // And then in processComparison(N, ...) C.addTransition(N->getState()->assume(ConditionVal, false), N); C.addTransition(N->getState()->assume(ConditionVal, true), N); It should also be possible to avoid transitions until the final state is computed, if the code is easy enough to refactor this way: // No addTransition() calls within, just produce the state ProgramStateRef State = assignToContainer(...); // And then in processComparison(N, ...) C.addTransition(State->assume(ConditionVal, false), N); C.addTransition(State->assume(*ConditionVal, true), N); This sort of stuff can be tested via `clang_analyzer_numTimesReached()` - see if you made exactly as many state splits as you wanted to.

NoQ added inline comments.Mar 25 2019, 5:21 PM

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
823–832	My feel is that a better `.addTransition()` API should capture the user's intent more straightforwardly, so that we could check dynamically that the resulting topology is indeed exactly what the user expects. I.e., produce multiple narrow-purpose APIs for common patterns: `C.updateState(State)`, `C.splitState(State1, State2, ..., StateN)` - both would fail if there were previous transitions in the same `CheckerContext` or if more transitions are made after them. The `updateState()` variant should probably try to lazily collapse multiple updates into a single node. Maybe instead don't require all branches to be specified simultaneously, i.e. instead do `addBranch(State)` that wouldn't fail in presence of other branches but would still conflict with `updateState()`. These narrow-purpose APIs are too clumsy to cover the current use-case, but at least they would've caught the problem. Maybe a better design could make it also comfortable to use.

Fixed double transition.

Looks great, thanks!

You can still add the regression test for the correct number of transitions if you want - even if it's an NFC patch, it's nice to know that we didn't regress something we could have accidentally regressed.

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
936–937	It looks as if you moved the function but forgot to move the comment.

This revision is now accepted and ready to land.Apr 1 2019, 2:34 PM

Closed by commit rC358951: [Analyzer] Instead of recording comparisons in interator checkers do an eager… (authored by baloghadamsoftware). · Explain WhyApr 23 2019, 12:14 AM

This revision was automatically updated to reflect the committed changes.

bjope added a subscriber: bjope.Apr 23 2019, 2:18 AM

bjope added inline comments.

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp

825

This is an uninitialized version of Sym that will be used on line 835 and line 839.
The Sym variable assigned on line 828 is local to the scope starting at line 826.

Not really sure, but was perhaps the idea to use the Sym value from line 828 on line 835 and 839.
Then I guess the you can rewrite this as:

// At least one of the iterators have recorded positions. If one of them has
// not then create a new symbol for the offset.
if (!LPos || !RPos) {
  auto &SymMgr = C.getSymbolManager();
  auto Sym = SymMgr.conjureSymbol(CE, C.getLocationContext(),
                                  C.getASTContext().LongTy, C.blockCount());
  State = assumeNoOverflow(State, Sym, 4);

  if (!LPos) {
    State = setIteratorPosition(State, LVal,
                                IteratorPosition::getPosition(Cont, Sym));
    LPos = getIteratorPosition(State, LVal);
  } else if (!RPos) {
    State = setIteratorPosition(State, RVal,
                                IteratorPosition::getPosition(Cont, Sym));
    RPos = getIteratorPosition(State, RVal);
  }
}

As it is right now I get complaint about using an uninitialized value for Sym (so this patch still doesn't build with -Werror after the earlier fixup).

Revision Contents

Path

Size

lib/

StaticAnalyzer/

Checkers/

IteratorChecker.cpp

386 lines

Diff 196190

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp

Show All 29 Lines
// Additionally, depending on the circumstances, operators of types II and III		// Additionally, depending on the circumstances, operators of types II and III
// can be represented as:		// can be represented as:
// * type-IIa, type-IIIa: conjured structure symbols - when returned by value		// * type-IIa, type-IIIa: conjured structure symbols - when returned by value
// from conservatively evaluated methods such as		// from conservatively evaluated methods such as
// `.begin()`.		// `.begin()`.
// * type-IIb, type-IIIb: memory regions of iterator-typed objects, such as		// * type-IIb, type-IIIb: memory regions of iterator-typed objects, such as
// variables or temporaries, when the iterator object is		// variables or temporaries, when the iterator object is
// currently treated as an lvalue.		// currently treated as an lvalue.
// * type-IIc, type-IIIc: compound values of iterator-typed objects, when the		// * type-IIc, type-IIIc: compound values of iterator-typed objects, when the
		SzelethusUnsubmitted Not Done Reply Inline Actions Did you mean lazy compound values? Szelethus: Did you mean lazy compound values?
// iterator object is treated as an rvalue taken of a		// iterator object is treated as an rvalue taken of a
// particular lvalue, eg. a copy of "type-a" iterator		// particular lvalue, eg. a copy of "type-a" iterator
// object, or an iterator that existed before the		// object, or an iterator that existed before the
// analysis has started.		// analysis has started.
//		//
// To handle any of these three different representations stored in an SVal we		// To handle any of these three different representations stored in an SVal we
// use setter and getters functions which separate the three cases. To store		// use setter and getters functions which separate the three cases. To store
// them we use a pointer union of symbol and memory region.		// them we use a pointer union of symbol and memory region.
▲ Show 20 Lines • Show All 81 Lines • ▼ Show 20 Lines	public:

void Profile(llvm::FoldingSetNodeID &ID) const {		void Profile(llvm::FoldingSetNodeID &ID) const {
ID.AddPointer(Cont);		ID.AddPointer(Cont);
ID.AddInteger(Valid);		ID.AddInteger(Valid);
ID.Add(Offset);		ID.Add(Offset);
}		}
};		};

typedef llvm::PointerUnion<const MemRegion *, SymbolRef> RegionOrSymbol;

// Structure to record the symbolic begin and end position of a container		// Structure to record the symbolic begin and end position of a container
struct ContainerData {		struct ContainerData {
private:		private:
const SymbolRef Begin, End;		const SymbolRef Begin, End;

ContainerData(SymbolRef B, SymbolRef E) : Begin(B), End(E) {}		ContainerData(SymbolRef B, SymbolRef E) : Begin(B), End(E) {}

public:		public:
Show All 20 Lines	bool operator!=(const ContainerData &X) const {
return Begin != X.Begin \|\| End != X.End;		return Begin != X.Begin \|\| End != X.End;
}		}

void Profile(llvm::FoldingSetNodeID &ID) const {		void Profile(llvm::FoldingSetNodeID &ID) const {
ID.Add(Begin);		ID.Add(Begin);
ID.Add(End);		ID.Add(End);
}		}
};		};

// Structure fo recording iterator comparisons. We needed to retrieve the
// original comparison expression in assumptions.
struct IteratorComparison {
private:
RegionOrSymbol Left, Right;
bool Equality;

public:
IteratorComparison(RegionOrSymbol L, RegionOrSymbol R, bool Eq)
: Left(L), Right(R), Equality(Eq) {}

RegionOrSymbol getLeft() const { return Left; }
RegionOrSymbol getRight() const { return Right; }
bool isEquality() const { return Equality; }
bool operator==(const IteratorComparison &X) const {
return Left == X.Left && Right == X.Right && Equality == X.Equality;
}
bool operator!=(const IteratorComparison &X) const {
return Left != X.Left \|\| Right != X.Right \|\| Equality != X.Equality;
}
void Profile(llvm::FoldingSetNodeID &ID) const { ID.AddInteger(Equality); }
};

class IteratorChecker		class IteratorChecker
		SzelethusUnsubmitted Not Done Reply Inline Actions typo: for Szelethus: typo: for
: public Checker<check::PreCall, check::PostCall,		: public Checker<check::PreCall, check::PostCall,
check::PostStmt<MaterializeTemporaryExpr>, check::Bind,		check::PostStmt<MaterializeTemporaryExpr>, check::Bind,
check::LiveSymbols, check::DeadSymbols,		check::LiveSymbols, check::DeadSymbols> {
eval::Assume> {

std::unique_ptr<BugType> OutOfRangeBugType;		std::unique_ptr<BugType> OutOfRangeBugType;
std::unique_ptr<BugType> MismatchedBugType;		std::unique_ptr<BugType> MismatchedBugType;
std::unique_ptr<BugType> InvalidatedBugType;		std::unique_ptr<BugType> InvalidatedBugType;

void handleComparison(CheckerContext &C, const SVal &RetVal, const SVal &LVal,		void handleComparison(CheckerContext &C, const Expr *CE, const SVal &RetVal,
const SVal &RVal, OverloadedOperatorKind Op) const;		const SVal &LVal, const SVal &RVal,
		OverloadedOperatorKind Op) const;
		void processComparison(CheckerContext &C, ProgramStateRef State,
		SymbolRef Sym1, SymbolRef Sym2, const SVal &RetVal,
		OverloadedOperatorKind Op) const;
void verifyAccess(CheckerContext &C, const SVal &Val) const;		void verifyAccess(CheckerContext &C, const SVal &Val) const;
void verifyDereference(CheckerContext &C, const SVal &Val) const;		void verifyDereference(CheckerContext &C, const SVal &Val) const;
void handleIncrement(CheckerContext &C, const SVal &RetVal, const SVal &Iter,		void handleIncrement(CheckerContext &C, const SVal &RetVal, const SVal &Iter,
bool Postfix) const;		bool Postfix) const;
void handleDecrement(CheckerContext &C, const SVal &RetVal, const SVal &Iter,		void handleDecrement(CheckerContext &C, const SVal &RetVal, const SVal &Iter,
bool Postfix) const;		bool Postfix) const;
void handleRandomIncrOrDecr(CheckerContext &C, OverloadedOperatorKind Op,		void handleRandomIncrOrDecr(CheckerContext &C, OverloadedOperatorKind Op,
const SVal &RetVal, const SVal &LHS,		const SVal &RetVal, const SVal &LHS,
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines	public:
void checkPostCall(const CallEvent &Call, CheckerContext &C) const;		void checkPostCall(const CallEvent &Call, CheckerContext &C) const;
void checkBind(SVal Loc, SVal Val, const Stmt *S, CheckerContext &C) const;		void checkBind(SVal Loc, SVal Val, const Stmt *S, CheckerContext &C) const;
void checkPostStmt(const CXXConstructExpr *CCE, CheckerContext &C) const;		void checkPostStmt(const CXXConstructExpr *CCE, CheckerContext &C) const;
void checkPostStmt(const DeclStmt *DS, CheckerContext &C) const;		void checkPostStmt(const DeclStmt *DS, CheckerContext &C) const;
void checkPostStmt(const MaterializeTemporaryExpr *MTE,		void checkPostStmt(const MaterializeTemporaryExpr *MTE,
CheckerContext &C) const;		CheckerContext &C) const;
void checkLiveSymbols(ProgramStateRef State, SymbolReaper &SR) const;		void checkLiveSymbols(ProgramStateRef State, SymbolReaper &SR) const;
void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const;		void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const;
ProgramStateRef evalAssume(ProgramStateRef State, SVal Cond,
bool Assumption) const;
};		};
} // namespace		} // namespace

REGISTER_MAP_WITH_PROGRAMSTATE(IteratorSymbolMap, SymbolRef, IteratorPosition)		REGISTER_MAP_WITH_PROGRAMSTATE(IteratorSymbolMap, SymbolRef, IteratorPosition)
REGISTER_MAP_WITH_PROGRAMSTATE(IteratorRegionMap, const MemRegion *,		REGISTER_MAP_WITH_PROGRAMSTATE(IteratorRegionMap, const MemRegion *,
IteratorPosition)		IteratorPosition)

REGISTER_MAP_WITH_PROGRAMSTATE(ContainerMap, const MemRegion *, ContainerData)		REGISTER_MAP_WITH_PROGRAMSTATE(ContainerMap, const MemRegion *, ContainerData)

REGISTER_MAP_WITH_PROGRAMSTATE(IteratorComparisonMap, const SymExpr *,
IteratorComparison)

namespace {		namespace {

bool isIteratorType(const QualType &Type);		bool isIteratorType(const QualType &Type);
bool isIterator(const CXXRecordDecl *CRD);		bool isIterator(const CXXRecordDecl *CRD);
bool isComparisonOperator(OverloadedOperatorKind OK);		bool isComparisonOperator(OverloadedOperatorKind OK);
bool isBeginCall(const FunctionDecl *Func);		bool isBeginCall(const FunctionDecl *Func);
bool isEndCall(const FunctionDecl *Func);		bool isEndCall(const FunctionDecl *Func);
bool isAssignCall(const FunctionDecl *Func);		bool isAssignCall(const FunctionDecl *Func);
Show All 12 Lines
bool isSimpleComparisonOperator(OverloadedOperatorKind OK);		bool isSimpleComparisonOperator(OverloadedOperatorKind OK);
bool isAccessOperator(OverloadedOperatorKind OK);		bool isAccessOperator(OverloadedOperatorKind OK);
bool isDereferenceOperator(OverloadedOperatorKind OK);		bool isDereferenceOperator(OverloadedOperatorKind OK);
bool isIncrementOperator(OverloadedOperatorKind OK);		bool isIncrementOperator(OverloadedOperatorKind OK);
bool isDecrementOperator(OverloadedOperatorKind OK);		bool isDecrementOperator(OverloadedOperatorKind OK);
bool isRandomIncrOrDecrOperator(OverloadedOperatorKind OK);		bool isRandomIncrOrDecrOperator(OverloadedOperatorKind OK);
bool hasSubscriptOperator(ProgramStateRef State, const MemRegion *Reg);		bool hasSubscriptOperator(ProgramStateRef State, const MemRegion *Reg);
bool frontModifiable(ProgramStateRef State, const MemRegion *Reg);		bool frontModifiable(ProgramStateRef State, const MemRegion *Reg);
bool backModifiable(ProgramStateRef State, const MemRegion *Reg);		bool backModifiable(ProgramStateRef State, const MemRegion *Reg);
BinaryOperator::Opcode getOpcode(const SymExpr *SE);
const RegionOrSymbol getRegionOrSymbol(const SVal &Val);
const ProgramStateRef processComparison(ProgramStateRef State,
RegionOrSymbol LVal,
RegionOrSymbol RVal, bool Equal);
const ProgramStateRef saveComparison(ProgramStateRef State,
const SymExpr *Condition, const SVal &LVal,
const SVal &RVal, bool Eq);
const IteratorComparison *loadComparison(ProgramStateRef State,
const SymExpr *Condition);
SymbolRef getContainerBegin(ProgramStateRef State, const MemRegion *Cont);		SymbolRef getContainerBegin(ProgramStateRef State, const MemRegion *Cont);
		SzelethusUnsubmitted Not Done Reply Inline Actions I think this name would be better if you added the new state to `CheckerContext` within this function (and making it `void`), or rename it to `getEvaluatedComparisonState`. Szelethus: I think this name would be better if you added the new state to `CheckerContext` within this…
SymbolRef getContainerEnd(ProgramStateRef State, const MemRegion *Cont);		SymbolRef getContainerEnd(ProgramStateRef State, const MemRegion *Cont);
ProgramStateRef createContainerBegin(ProgramStateRef State,		ProgramStateRef createContainerBegin(ProgramStateRef State,
const MemRegion *Cont,		const MemRegion *Cont,
const SymbolRef Sym);		const SymbolRef Sym);
ProgramStateRef createContainerEnd(ProgramStateRef State, const MemRegion *Cont,		ProgramStateRef createContainerEnd(ProgramStateRef State, const MemRegion *Cont,
const SymbolRef Sym);		const SymbolRef Sym);
const IteratorPosition *getIteratorPosition(ProgramStateRef State,		const IteratorPosition *getIteratorPosition(ProgramStateRef State,
const SVal &Val);		const SVal &Val);
const IteratorPosition *getIteratorPosition(ProgramStateRef State,
RegionOrSymbol RegOrSym);
ProgramStateRef setIteratorPosition(ProgramStateRef State, const SVal &Val,		ProgramStateRef setIteratorPosition(ProgramStateRef State, const SVal &Val,
const IteratorPosition &Pos);		const IteratorPosition &Pos);
ProgramStateRef setIteratorPosition(ProgramStateRef State,
RegionOrSymbol RegOrSym,
const IteratorPosition &Pos);
ProgramStateRef removeIteratorPosition(ProgramStateRef State, const SVal &Val);		ProgramStateRef removeIteratorPosition(ProgramStateRef State, const SVal &Val);
ProgramStateRef adjustIteratorPosition(ProgramStateRef State,		ProgramStateRef assumeNoOverflow(ProgramStateRef State, SymbolRef Sym,
RegionOrSymbol RegOrSym,		long Scale);
const IteratorPosition &Pos, bool Equal);
ProgramStateRef relateIteratorPositions(ProgramStateRef State,
const IteratorPosition &Pos1,
const IteratorPosition &Pos2,
bool Equal);
ProgramStateRef invalidateAllIteratorPositions(ProgramStateRef State,		ProgramStateRef invalidateAllIteratorPositions(ProgramStateRef State,
const MemRegion *Cont);		const MemRegion *Cont);
ProgramStateRef		ProgramStateRef
invalidateAllIteratorPositionsExcept(ProgramStateRef State,		invalidateAllIteratorPositionsExcept(ProgramStateRef State,
const MemRegion *Cont, SymbolRef Offset,		const MemRegion *Cont, SymbolRef Offset,
BinaryOperator::Opcode Opc);		BinaryOperator::Opcode Opc);
ProgramStateRef invalidateIteratorPositions(ProgramStateRef State,		ProgramStateRef invalidateIteratorPositions(ProgramStateRef State,
SymbolRef Offset,		SymbolRef Offset,
Show All 9 Lines
ProgramStateRef reassignAllIteratorPositionsUnless(ProgramStateRef State,		ProgramStateRef reassignAllIteratorPositionsUnless(ProgramStateRef State,
const MemRegion *Cont,		const MemRegion *Cont,
const MemRegion *NewCont,		const MemRegion *NewCont,
SymbolRef Offset,		SymbolRef Offset,
BinaryOperator::Opcode Opc);		BinaryOperator::Opcode Opc);
ProgramStateRef rebaseSymbolInIteratorPositionsIf(		ProgramStateRef rebaseSymbolInIteratorPositionsIf(
ProgramStateRef State, SValBuilder &SVB, SymbolRef OldSym,		ProgramStateRef State, SValBuilder &SVB, SymbolRef OldSym,
SymbolRef NewSym, SymbolRef CondSym, BinaryOperator::Opcode Opc);		SymbolRef NewSym, SymbolRef CondSym, BinaryOperator::Opcode Opc);
		ProgramStateRef relateSymbols(ProgramStateRef State, SymbolRef Sym1,
		SzelethusUnsubmitted Not Done Reply Inline Actions I think `evaluateComparison` would be a more fitting name. Szelethus: I think `evaluateComparison` would be a more fitting name.
		SymbolRef Sym2, bool Equal);
const ContainerData *getContainerData(ProgramStateRef State,		const ContainerData *getContainerData(ProgramStateRef State,
const MemRegion *Cont);		const MemRegion *Cont);
ProgramStateRef setContainerData(ProgramStateRef State, const MemRegion *Cont,		ProgramStateRef setContainerData(ProgramStateRef State, const MemRegion *Cont,
const ContainerData &CData);		const ContainerData &CData);
bool hasLiveIterators(ProgramStateRef State, const MemRegion *Cont);		bool hasLiveIterators(ProgramStateRef State, const MemRegion *Cont);
bool isBoundThroughLazyCompoundVal(const Environment &Env,		bool isBoundThroughLazyCompoundVal(const Environment &Env,
const MemRegion *Reg);		const MemRegion *Reg);
bool isPastTheEnd(ProgramStateRef State, const IteratorPosition &Pos);		bool isPastTheEnd(ProgramStateRef State, const IteratorPosition &Pos);
▲ Show 20 Lines • Show All 216 Lines • ▼ Show 20 Lines	if (isAssignmentOperator(Op)) {
const auto *InstCall = dyn_cast<CXXInstanceCall>(&Call);		const auto *InstCall = dyn_cast<CXXInstanceCall>(&Call);
if (Func->getParamDecl(0)->getType()->isRValueReferenceType()) {		if (Func->getParamDecl(0)->getType()->isRValueReferenceType()) {
handleAssign(C, InstCall->getCXXThisVal(), Call.getOriginExpr(),		handleAssign(C, InstCall->getCXXThisVal(), Call.getOriginExpr(),
Call.getArgSVal(0));		Call.getArgSVal(0));
} else {		} else {
handleAssign(C, InstCall->getCXXThisVal());		handleAssign(C, InstCall->getCXXThisVal());
}		}
} else if (isSimpleComparisonOperator(Op)) {		} else if (isSimpleComparisonOperator(Op)) {
		const auto *OrigExpr = Call.getOriginExpr();
		if (!OrigExpr)
		return;

if (const auto *InstCall = dyn_cast<CXXInstanceCall>(&Call)) {		if (const auto *InstCall = dyn_cast<CXXInstanceCall>(&Call)) {
handleComparison(C, Call.getReturnValue(), InstCall->getCXXThisVal(),		handleComparison(C, OrigExpr, Call.getReturnValue(),
Call.getArgSVal(0), Op);		InstCall->getCXXThisVal(), Call.getArgSVal(0), Op);
} else {		} else {
handleComparison(C, Call.getReturnValue(), Call.getArgSVal(0),		handleComparison(C, OrigExpr, Call.getReturnValue(), Call.getArgSVal(0),
Call.getArgSVal(1), Op);		Call.getArgSVal(1), Op);
}		}
} else if (isRandomIncrOrDecrOperator(Func->getOverloadedOperator())) {		} else if (isRandomIncrOrDecrOperator(Func->getOverloadedOperator())) {
if (const auto *InstCall = dyn_cast<CXXInstanceCall>(&Call)) {		if (const auto *InstCall = dyn_cast<CXXInstanceCall>(&Call)) {
if (Call.getNumArgs() >= 1) {		if (Call.getNumArgs() >= 1) {
handleRandomIncrOrDecr(C, Func->getOverloadedOperator(),		handleRandomIncrOrDecr(C, Func->getOverloadedOperator(),
Call.getReturnValue(),		Call.getReturnValue(),
InstCall->getCXXThisVal(), Call.getArgSVal(0));		InstCall->getCXXThisVal(), Call.getArgSVal(0));
▲ Show 20 Lines • Show All 202 Lines • ▼ Show 20 Lines	if (!SR.isLiveRegion(Cont.first)) {
// We must keep the container data while it has live iterators to be able		// We must keep the container data while it has live iterators to be able
// to compare them to the begin and the end of the container.		// to compare them to the begin and the end of the container.
if (!hasLiveIterators(State, Cont.first)) {		if (!hasLiveIterators(State, Cont.first)) {
State = State->remove<ContainerMap>(Cont.first);		State = State->remove<ContainerMap>(Cont.first);
}		}
}		}
}		}

auto ComparisonMap = State->get<IteratorComparisonMap>();
for (const auto Comp : ComparisonMap) {
if (!SR.isLive(Comp.first)) {
State = State->remove<IteratorComparisonMap>(Comp.first);
}
}

C.addTransition(State);		C.addTransition(State);
}		}

ProgramStateRef IteratorChecker::evalAssume(ProgramStateRef State, SVal Cond,		void IteratorChecker::handleComparison(CheckerContext &C, const Expr *CE,
bool Assumption) const {		const SVal &RetVal, const SVal &LVal,
// Load recorded comparison and transfer iterator state between sides		const SVal &RVal,
// according to comparison operator and assumption
const auto *SE = Cond.getAsSymExpr();
if (!SE)
return State;

auto Opc = getOpcode(SE);
if (Opc != BO_EQ && Opc != BO_NE)
return State;

bool Negated = false;
const auto *Comp = loadComparison(State, SE);
if (!Comp) {
// Try negated comparison, which is a SymExpr to 0 integer comparison
const auto *SIE = dyn_cast<SymIntExpr>(SE);
if (!SIE)
return State;

if (SIE->getRHS() != 0)
return State;

SE = SIE->getLHS();
Negated = SIE->getOpcode() == BO_EQ; // Equal to zero means negation
Opc = getOpcode(SE);
if (Opc != BO_EQ && Opc != BO_NE)
return State;

Comp = loadComparison(State, SE);
if (!Comp)
return State;
}

return processComparison(State, Comp->getLeft(), Comp->getRight(),
(Comp->isEquality() == Assumption) != Negated);
}

void IteratorChecker::handleComparison(CheckerContext &C, const SVal &RetVal,
const SVal &LVal, const SVal &RVal,
OverloadedOperatorKind Op) const {		OverloadedOperatorKind Op) const {
// Record the operands and the operator of the comparison for the next		// Record the operands and the operator of the comparison for the next
// evalAssume, if the result is a symbolic expression. If it is a concrete		// evalAssume, if the result is a symbolic expression. If it is a concrete
// value (only one branch is possible), then transfer the state between		// value (only one branch is possible), then transfer the state between
// the operands according to the operator and the result		// the operands according to the operator and the result
auto State = C.getState();		auto State = C.getState();
if (const auto *Condition = RetVal.getAsSymbolicExpression()) {
const auto *LPos = getIteratorPosition(State, LVal);		const auto *LPos = getIteratorPosition(State, LVal);
const auto *RPos = getIteratorPosition(State, RVal);		const auto *RPos = getIteratorPosition(State, RVal);
if (!LPos && !RPos)		const MemRegion *Cont = nullptr;
		if (LPos) {
		Cont = LPos->getContainer();
		} else if (RPos) {
		Cont = RPos->getContainer();
		}
		if (!Cont)
return;		return;
State = saveComparison(State, Condition, LVal, RVal, Op == OO_EqualEqual);
C.addTransition(State);		// At least one of the iterators have recorded positions. If one of them has
} else if (const auto TruthVal = RetVal.getAs<nonloc::ConcreteInt>()) {		// not then create a new symbol for the offset.
if ((State = processComparison(		SymbolRef Sym;
		bjopeUnsubmitted Not Done Reply Inline Actions This is an uninitialized version of `Sym` that will be used on line 835 and line 839. The `Sym` variable assigned on line 828 is local to the scope starting at line 826. Not really sure, but was perhaps the idea to use the `Sym` value from line 828 on line 835 and 839. Then I guess the you can rewrite this as: // At least one of the iterators have recorded positions. If one of them has // not then create a new symbol for the offset. if (!LPos \|\| !RPos) { auto &SymMgr = C.getSymbolManager(); auto Sym = SymMgr.conjureSymbol(CE, C.getLocationContext(), C.getASTContext().LongTy, C.blockCount()); State = assumeNoOverflow(State, Sym, 4); if (!LPos) { State = setIteratorPosition(State, LVal, IteratorPosition::getPosition(Cont, Sym)); LPos = getIteratorPosition(State, LVal); } else if (!RPos) { State = setIteratorPosition(State, RVal, IteratorPosition::getPosition(Cont, Sym)); RPos = getIteratorPosition(State, RVal); } } As it is right now I get complaint about using an uninitialized value for `Sym` (so this patch still doesn't build with -Werror after the earlier fixup). bjope: This is an uninitialized version of `Sym` that will be used on line 835 and line 839. The `Sym`…
State, getRegionOrSymbol(LVal), getRegionOrSymbol(RVal),		if (!LPos \|\| !RPos) {
(Op == OO_EqualEqual) == (TruthVal->getValue() != 0)))) {		auto &SymMgr = C.getSymbolManager();
		auto Sym = SymMgr.conjureSymbol(CE, C.getLocationContext(),
		C.getASTContext().LongTy, C.blockCount());
		State = assumeNoOverflow(State, Sym, 4);
		}

		NoQUnsubmitted Not Done Reply Inline Actions Welcome to the `addTransition()` hell! Each of the `assignToContainer()` may add a transition, and then `processComparison()` also adds transitions. I suspect that it may lead to more state splits that were intended. I.e., the execution path on which the iterator is assigned to a container would be different from the two execution paths on which the comparison was processed. You can chain `addTransition()`s to each other, eg.: // Return the node produced by the inner addTransition() ExplodedNode N = assignToContainer(...); // And then in processComparison(N, ...) C.addTransition(N->getState()->assume(ConditionVal, false), N); C.addTransition(N->getState()->assume(ConditionVal, true), N); It should also be possible to avoid transitions until the final state is computed, if the code is easy enough to refactor this way: // No addTransition() calls within, just produce the state ProgramStateRef State = assignToContainer(...); // And then in processComparison(N, ...) C.addTransition(State->assume(ConditionVal, false), N); C.addTransition(State->assume(ConditionVal, true), N); This sort of stuff can be tested via `clang_analyzer_numTimesReached()` - see if you made exactly as many state splits as you wanted to. NoQ:* Welcome to the `addTransition()` hell! Each of the `assignToContainer()` may add a transition…
		NoQUnsubmitted Not Done Reply Inline Actions My feel is that a better `.addTransition()` API should capture the user's intent more straightforwardly, so that we could check dynamically that the resulting topology is indeed exactly what the user expects. I.e., produce multiple narrow-purpose APIs for common patterns: `C.updateState(State)`, `C.splitState(State1, State2, ..., StateN)` - both would fail if there were previous transitions in the same `CheckerContext` or if more transitions are made after them. The `updateState()` variant should probably try to lazily collapse multiple updates into a single node. Maybe instead don't require all branches to be specified simultaneously, i.e. instead do `addBranch(State)` that wouldn't fail in presence of other branches but would still conflict with `updateState()`. These narrow-purpose APIs are too clumsy to cover the current use-case, but at least they would've caught the problem. Maybe a better design could make it also comfortable to use. NoQ: My feel is that a better `.addTransition()` API should capture the user's intent more…
		if (!LPos) {
		State = setIteratorPosition(State, LVal,
		IteratorPosition::getPosition(Cont, Sym));
		LPos = getIteratorPosition(State, LVal);
		} else if (!RPos) {
		State = setIteratorPosition(State, RVal,
		IteratorPosition::getPosition(Cont, Sym));
		RPos = getIteratorPosition(State, RVal);
		}

		processComparison(C, State, LPos->getOffset(), RPos->getOffset(), RetVal, Op);
		}

		void IteratorChecker::processComparison(CheckerContext &C,
		ProgramStateRef State, SymbolRef Sym1,
		SymbolRef Sym2, const SVal &RetVal,
		OverloadedOperatorKind Op) const {
		if (const auto TruthVal = RetVal.getAs<nonloc::ConcreteInt>()) {
		if (State = relateSymbols(State, Sym1, Sym2,
		(Op == OO_EqualEqual) ==
		(TruthVal->getValue() != 0))) {
C.addTransition(State);		C.addTransition(State);
} else {		} else {
C.generateSink(State, C.getPredecessor());		C.generateSink(State, C.getPredecessor());
}		}
		return;
		}

		const auto ConditionVal = RetVal.getAs<DefinedSVal>();
		if (!ConditionVal)
		return;

		if (auto StateTrue = relateSymbols(State, Sym1, Sym2, Op == OO_EqualEqual)) {
		StateTrue = StateTrue->assume(*ConditionVal, true);
		C.addTransition(StateTrue);
		}

		if (auto StateFalse = relateSymbols(State, Sym1, Sym2, Op != OO_EqualEqual)) {
		StateFalse = StateFalse->assume(*ConditionVal, false);
		C.addTransition(StateFalse);
}		}
}		}

void IteratorChecker::verifyDereference(CheckerContext &C,		void IteratorChecker::verifyDereference(CheckerContext &C,
const SVal &Val) const {		const SVal &Val) const {
auto State = C.getState();		auto State = C.getState();
const auto *Pos = getIteratorPosition(State, Val);		const auto *Pos = getIteratorPosition(State, Val);
if (Pos && isPastTheEnd(State, *Pos)) {		if (Pos && isPastTheEnd(State, *Pos)) {
▲ Show 20 Lines • Show All 47 Lines • ▼ Show 20 Lines	if (Pos) {
const auto NewPos =		const auto NewPos =
advancePosition(C, OO_Minus, *Pos,		advancePosition(C, OO_Minus, *Pos,
nonloc::ConcreteInt(BVF.getValue(llvm::APSInt::get(1))));		nonloc::ConcreteInt(BVF.getValue(llvm::APSInt::get(1))));
State = setIteratorPosition(State, Iter, NewPos);		State = setIteratorPosition(State, Iter, NewPos);
State = setIteratorPosition(State, RetVal, Postfix ? *Pos : NewPos);		State = setIteratorPosition(State, RetVal, Postfix ? *Pos : NewPos);
C.addTransition(State);		C.addTransition(State);
}		}
}		}

// This function tells the analyzer's engine that symbols produced by our
// checker, most notably iterator positions, are relatively small.
// A distance between items in the container should not be very large.
// By assuming that it is within around 1/8 of the address space,
// we can help the analyzer perform operations on these symbols
// without being afraid of integer overflows.
// FIXME: Should we provide it as an API, so that all checkers could use it?
static ProgramStateRef assumeNoOverflow(ProgramStateRef State, SymbolRef Sym,
long Scale) {
SValBuilder &SVB = State->getStateManager().getSValBuilder();
BasicValueFactory &BV = SVB.getBasicValueFactory();

QualType T = Sym->getType();
assert(T->isSignedIntegerOrEnumerationType());
APSIntType AT = BV.getAPSIntType(T);

ProgramStateRef NewState = State;

llvm::APSInt Max = AT.getMaxValue() / AT.getValue(Scale);
SVal IsCappedFromAbove =
SVB.evalBinOpNN(State, BO_LE, nonloc::SymbolVal(Sym),
nonloc::ConcreteInt(Max), SVB.getConditionType());
if (auto DV = IsCappedFromAbove.getAs<DefinedSVal>()) {
NewState = NewState->assume(*DV, true);
if (!NewState)
return State;
}

llvm::APSInt Min = -Max;
SVal IsCappedFromBelow =
SVB.evalBinOpNN(State, BO_GE, nonloc::SymbolVal(Sym),
nonloc::ConcreteInt(Min), SVB.getConditionType());
if (auto DV = IsCappedFromBelow.getAs<DefinedSVal>()) {
NewState = NewState->assume(*DV, true);
if (!NewState)
return State;
}

return NewState;
}

void IteratorChecker::handleRandomIncrOrDecr(CheckerContext &C,		void IteratorChecker::handleRandomIncrOrDecr(CheckerContext &C,
		NoQUnsubmitted Not Done Reply Inline Actions It looks as if you moved the function but forgot to move the comment. NoQ: It looks as if you moved the function but forgot to move the comment.
OverloadedOperatorKind Op,		OverloadedOperatorKind Op,
const SVal &RetVal,		const SVal &RetVal,
const SVal &LHS,		const SVal &LHS,
const SVal &RHS) const {		const SVal &RHS) const {
// Increment or decrement the symbolic expressions which represents the		// Increment or decrement the symbolic expressions which represents the
// position of the iterator		// position of the iterator
auto State = C.getState();		auto State = C.getState();
const auto *Pos = getIteratorPosition(State, LHS);		const auto *Pos = getIteratorPosition(State, LHS);
▲ Show 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	void IteratorChecker::handleEnd(CheckerContext &C, const Expr *CE,
}		}
State = setIteratorPosition(State, RetVal,		State = setIteratorPosition(State, RetVal,
IteratorPosition::getPosition(ContReg, EndSym));		IteratorPosition::getPosition(ContReg, EndSym));
C.addTransition(State);		C.addTransition(State);
}		}

void IteratorChecker::assignToContainer(CheckerContext &C, const Expr *CE,		void IteratorChecker::assignToContainer(CheckerContext &C, const Expr *CE,
const SVal &RetVal,		const SVal &RetVal,
const MemRegion *Cont) const {		const MemRegion *Cont) const {
Cont = Cont->getMostDerivedObjectRegion();		Cont = Cont->getMostDerivedObjectRegion();
		SzelethusUnsubmitted Not Done Reply Inline Actions You will have to excuse me for commenting on something totally unrelated, but I'm afraid this may cause a crash, if the region returned by `getSuperRegion` is symbolic. I encountered this error when doing the exact same thing in my checker: D50892. Can something like this occur with this checker? Szelethus: You will have to excuse me for commenting on something totally unrelated, but I'm afraid this…
		NoQUnsubmitted Not Done Reply Inline Actions Hmm, had a look at the crash you mention here. Your code crashed because you additionally did `getAs<TypedValueRegion>`, which would turn your pointer into a null when a symbolic region is encountered. So the final code ended up a bit weird: there's no need to check that it's a `TypedValueRegion` before you check that it's a `CXXBaseObjectRegion`; just check for the latter directly. This code looks correct in this sense. Also, since this code keeps copied around, do we need a better helper function for this unwrap idiom? I.e., something like `MemRegion::StripCasts()` that only unwraps derived-to-base casts? NoQ: Hmm, had a look at the crash you mention here. Your code crashed because you additionally did…

auto State = C.getState();		auto State = C.getState();
auto &SymMgr = C.getSymbolManager();		auto &SymMgr = C.getSymbolManager();
auto Sym = SymMgr.conjureSymbol(CE, C.getLocationContext(),		auto Sym = SymMgr.conjureSymbol(CE, C.getLocationContext(),
C.getASTContext().LongTy, C.blockCount());		C.getASTContext().LongTy, C.blockCount());
State = assumeNoOverflow(State, Sym, 4);		State = assumeNoOverflow(State, Sym, 4);
State = setIteratorPosition(State, RetVal,		State = setIteratorPosition(State, RetVal,
IteratorPosition::getPosition(Cont, Sym));		IteratorPosition::getPosition(Cont, Sym));
▲ Show 20 Lines • Show All 660 Lines • ▼ Show 20 Lines	bool isDecrementOperator(OverloadedOperatorKind OK) {
return OK == OO_MinusMinus;		return OK == OO_MinusMinus;
}		}

bool isRandomIncrOrDecrOperator(OverloadedOperatorKind OK) {		bool isRandomIncrOrDecrOperator(OverloadedOperatorKind OK) {
return OK == OO_Plus \|\| OK == OO_PlusEqual \|\| OK == OO_Minus \|\|		return OK == OO_Plus \|\| OK == OO_PlusEqual \|\| OK == OO_Minus \|\|
OK == OO_MinusEqual;		OK == OO_MinusEqual;
}		}

BinaryOperator::Opcode getOpcode(const SymExpr *SE) {
if (const auto *BSE = dyn_cast<BinarySymExpr>(SE)) {
return BSE->getOpcode();
} else if (const auto *SC = dyn_cast<SymbolConjured>(SE)) {
const auto *COE = dyn_cast_or_null<CXXOperatorCallExpr>(SC->getStmt());
if (!COE)
return BO_Comma; // Extremal value, neither EQ nor NE
if (COE->getOperator() == OO_EqualEqual) {
return BO_EQ;
} else if (COE->getOperator() == OO_ExclaimEqual) {
return BO_NE;
}
return BO_Comma; // Extremal value, neither EQ nor NE
}
return BO_Comma; // Extremal value, neither EQ nor NE
}

bool hasSubscriptOperator(ProgramStateRef State, const MemRegion *Reg) {		bool hasSubscriptOperator(ProgramStateRef State, const MemRegion *Reg) {
const auto *CRD = getCXXRecordDecl(State, Reg);		const auto *CRD = getCXXRecordDecl(State, Reg);
if (!CRD)		if (!CRD)
return false;		return false;

for (const auto *Method : CRD->methods()) {		for (const auto *Method : CRD->methods()) {
if (!Method->isOverloadedOperator())		if (!Method->isOverloadedOperator())
continue;		continue;
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines	const CXXRecordDecl *getCXXRecordDecl(ProgramStateRef State,
auto Type = TI.getType();		auto Type = TI.getType();
if (const auto *RefT = Type->getAs<ReferenceType>()) {		if (const auto *RefT = Type->getAs<ReferenceType>()) {
Type = RefT->getPointeeType();		Type = RefT->getPointeeType();
}		}

return Type->getUnqualifiedDesugaredType()->getAsCXXRecordDecl();		return Type->getUnqualifiedDesugaredType()->getAsCXXRecordDecl();
}		}

const RegionOrSymbol getRegionOrSymbol(const SVal &Val) {
if (const auto Reg = Val.getAsRegion()) {
return Reg;
} else if (const auto Sym = Val.getAsSymbol()) {
return Sym;
} else if (const auto LCVal = Val.getAs<nonloc::LazyCompoundVal>()) {
return LCVal->getRegion();
}
return RegionOrSymbol();
}

const ProgramStateRef processComparison(ProgramStateRef State,
RegionOrSymbol LVal,
RegionOrSymbol RVal, bool Equal) {
const auto *LPos = getIteratorPosition(State, LVal);
const auto *RPos = getIteratorPosition(State, RVal);
if (LPos && !RPos) {
State = adjustIteratorPosition(State, RVal, *LPos, Equal);
} else if (!LPos && RPos) {
State = adjustIteratorPosition(State, LVal, *RPos, Equal);
} else if (LPos && RPos) {
State = relateIteratorPositions(State, LPos, RPos, Equal);
}
return State;
}

const ProgramStateRef saveComparison(ProgramStateRef State,
const SymExpr *Condition, const SVal &LVal,
const SVal &RVal, bool Eq) {
const auto Left = getRegionOrSymbol(LVal);
const auto Right = getRegionOrSymbol(RVal);
if (!Left \|\| !Right)
return State;
return State->set<IteratorComparisonMap>(Condition,
IteratorComparison(Left, Right, Eq));
}

const IteratorComparison *loadComparison(ProgramStateRef State,
const SymExpr *Condition) {
return State->get<IteratorComparisonMap>(Condition);
}

SymbolRef getContainerBegin(ProgramStateRef State, const MemRegion *Cont) {		SymbolRef getContainerBegin(ProgramStateRef State, const MemRegion *Cont) {
const auto *CDataPtr = getContainerData(State, Cont);		const auto *CDataPtr = getContainerData(State, Cont);
if (!CDataPtr)		if (!CDataPtr)
return nullptr;		return nullptr;

return CDataPtr->getBegin();		return CDataPtr->getBegin();
}		}

▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	const IteratorPosition *getIteratorPosition(ProgramStateRef State,
} else if (const auto Sym = Val.getAsSymbol()) {		} else if (const auto Sym = Val.getAsSymbol()) {
return State->get<IteratorSymbolMap>(Sym);		return State->get<IteratorSymbolMap>(Sym);
} else if (const auto LCVal = Val.getAs<nonloc::LazyCompoundVal>()) {		} else if (const auto LCVal = Val.getAs<nonloc::LazyCompoundVal>()) {
return State->get<IteratorRegionMap>(LCVal->getRegion());		return State->get<IteratorRegionMap>(LCVal->getRegion());
}		}
return nullptr;		return nullptr;
}		}

const IteratorPosition *getIteratorPosition(ProgramStateRef State,
RegionOrSymbol RegOrSym) {
if (RegOrSym.is<const MemRegion *>()) {
auto Reg = RegOrSym.get<const MemRegion *>()->getMostDerivedObjectRegion();
return State->get<IteratorRegionMap>(Reg);
} else if (RegOrSym.is<SymbolRef>()) {
return State->get<IteratorSymbolMap>(RegOrSym.get<SymbolRef>());
}
return nullptr;
}

ProgramStateRef setIteratorPosition(ProgramStateRef State, const SVal &Val,		ProgramStateRef setIteratorPosition(ProgramStateRef State, const SVal &Val,
const IteratorPosition &Pos) {		const IteratorPosition &Pos) {
if (auto Reg = Val.getAsRegion()) {		if (auto Reg = Val.getAsRegion()) {
Reg = Reg->getMostDerivedObjectRegion();		Reg = Reg->getMostDerivedObjectRegion();
return State->set<IteratorRegionMap>(Reg, Pos);		return State->set<IteratorRegionMap>(Reg, Pos);
} else if (const auto Sym = Val.getAsSymbol()) {		} else if (const auto Sym = Val.getAsSymbol()) {
return State->set<IteratorSymbolMap>(Sym, Pos);		return State->set<IteratorSymbolMap>(Sym, Pos);
} else if (const auto LCVal = Val.getAs<nonloc::LazyCompoundVal>()) {		} else if (const auto LCVal = Val.getAs<nonloc::LazyCompoundVal>()) {
return State->set<IteratorRegionMap>(LCVal->getRegion(), Pos);		return State->set<IteratorRegionMap>(LCVal->getRegion(), Pos);
}		}
return nullptr;		return nullptr;
}		}

ProgramStateRef setIteratorPosition(ProgramStateRef State,
RegionOrSymbol RegOrSym,
const IteratorPosition &Pos) {
if (RegOrSym.is<const MemRegion *>()) {
auto Reg = RegOrSym.get<const MemRegion *>()->getMostDerivedObjectRegion();
return State->set<IteratorRegionMap>(Reg, Pos);
} else if (RegOrSym.is<SymbolRef>()) {
return State->set<IteratorSymbolMap>(RegOrSym.get<SymbolRef>(), Pos);
}
return nullptr;
}

ProgramStateRef removeIteratorPosition(ProgramStateRef State, const SVal &Val) {		ProgramStateRef removeIteratorPosition(ProgramStateRef State, const SVal &Val) {
if (auto Reg = Val.getAsRegion()) {		if (auto Reg = Val.getAsRegion()) {
Reg = Reg->getMostDerivedObjectRegion();		Reg = Reg->getMostDerivedObjectRegion();
return State->remove<IteratorRegionMap>(Reg);		return State->remove<IteratorRegionMap>(Reg);
} else if (const auto Sym = Val.getAsSymbol()) {		} else if (const auto Sym = Val.getAsSymbol()) {
return State->remove<IteratorSymbolMap>(Sym);		return State->remove<IteratorSymbolMap>(Sym);
} else if (const auto LCVal = Val.getAs<nonloc::LazyCompoundVal>()) {		} else if (const auto LCVal = Val.getAs<nonloc::LazyCompoundVal>()) {
return State->remove<IteratorRegionMap>(LCVal->getRegion());		return State->remove<IteratorRegionMap>(LCVal->getRegion());
}		}
return nullptr;		return nullptr;
}		}

ProgramStateRef adjustIteratorPosition(ProgramStateRef State,		ProgramStateRef relateSymbols(ProgramStateRef State, SymbolRef Sym1,
RegionOrSymbol RegOrSym,		SymbolRef Sym2, bool Equal) {
const IteratorPosition &Pos,
bool Equal) {
if (Equal) {
return setIteratorPosition(State, RegOrSym, Pos);
} else {
return State;
}
}

ProgramStateRef relateIteratorPositions(ProgramStateRef State,
const IteratorPosition &Pos1,
const IteratorPosition &Pos2,
bool Equal) {
auto &SVB = State->getStateManager().getSValBuilder();		auto &SVB = State->getStateManager().getSValBuilder();

// FIXME: This code should be reworked as follows:		// FIXME: This code should be reworked as follows:
// 1. Subtract the operands using evalBinOp().		// 1. Subtract the operands using evalBinOp().
// 2. Assume that the result doesn't overflow.		// 2. Assume that the result doesn't overflow.
// 3. Compare the result to 0.		// 3. Compare the result to 0.
// 4. Assume the result of the comparison.		// 4. Assume the result of the comparison.
const auto comparison =		const auto comparison =
SVB.evalBinOp(State, BO_EQ, nonloc::SymbolVal(Pos1.getOffset()),		SVB.evalBinOp(State, BO_EQ, nonloc::SymbolVal(Sym1),
nonloc::SymbolVal(Pos2.getOffset()),		nonloc::SymbolVal(Sym2), SVB.getConditionType());
SVB.getConditionType());

assert(comparison.getAs<DefinedSVal>() &&		assert(comparison.getAs<DefinedSVal>() &&
"Symbol comparison must be a `DefinedSVal`");		"Symbol comparison must be a `DefinedSVal`");

auto NewState = State->assume(comparison.castAs<DefinedSVal>(), Equal);		auto NewState = State->assume(comparison.castAs<DefinedSVal>(), Equal);
		if (!NewState)
		return nullptr;
		SzelethusUnsubmitted Not Done Reply Inline Actions It isn't obvious to me (at first) what happens here -- maybe document when this function will return with `nullptr`? When `relateSymbol` is called and checked whether the returned value is null or not, one could think that this symbolizes some sort of failure. Szelethus: It isn't obvious to me (at first) what happens here -- maybe document when this function will…

if (const auto CompSym = comparison.getAsSymbol()) {		if (const auto CompSym = comparison.getAsSymbol()) {
assert(isa<SymIntExpr>(CompSym) &&		assert(isa<SymIntExpr>(CompSym) &&
"Symbol comparison must be a `SymIntExpr`");		"Symbol comparison must be a `SymIntExpr`");
assert(BinaryOperator::isComparisonOp(		assert(BinaryOperator::isComparisonOp(
cast<SymIntExpr>(CompSym)->getOpcode()) &&		cast<SymIntExpr>(CompSym)->getOpcode()) &&
"Symbol comparison must be a comparison");		"Symbol comparison must be a comparison");
return assumeNoOverflow(NewState, cast<SymIntExpr>(CompSym)->getLHS(), 2);		return assumeNoOverflow(NewState, cast<SymIntExpr>(CompSym)->getLHS(), 2);
		NoQUnsubmitted Not Done Reply Inline Actions P.S. What was the idea here? Like, `CompSym` was just computed via `BO_EQ` and has type of a condition, i.e. `bool` because we are in C++. Is this code trying to say that the result of the comparison is bounded by `true/2`? NoQ: P.S. What was the idea here? Like, `CompSym` was just computed via `BO_EQ` and has type of a…
		baloghadamsoftwareAuthorUnsubmitted Not Done Reply Inline Actions There is also a `->getLHS()` which means that we enforce the bound on the left-hand side of the rearranged comparison. Although both symbols are bounded by `max/4`, constraint manager does not imply that the sum/diff is the bounded by `max/2`. I have to enforce this manually to prevent `min` negated to `min` when the constraint manager negates the difference. baloghadamsoftware: There is also a `->getLHS()` which means that we enforce the bound on the left-hand side of the…
		NoQUnsubmitted Not Done Reply Inline Actions Ouch, right, didn't notice `getLHS()`, sorry! NoQ: Ouch, right, didn't notice `getLHS()`, sorry!
}		}

return NewState;		return NewState;
}		}

bool hasLiveIterators(ProgramStateRef State, const MemRegion *Cont) {		bool hasLiveIterators(ProgramStateRef State, const MemRegion *Cont) {
auto RegionMap = State->get<IteratorRegionMap>();		auto RegionMap = State->get<IteratorRegionMap>();
for (const auto Reg : RegionMap) {		for (const auto Reg : RegionMap) {
Show All 17 Lines	if (const auto LCVal = Binding.second.getAs<nonloc::LazyCompoundVal>()) {
if (LCVal->getRegion() == Reg)		if (LCVal->getRegion() == Reg)
return true;		return true;
}		}
}		}

return false;		return false;
}		}

		// This function tells the analyzer's engine that symbols produced by our
		// checker, most notably iterator positions, are relatively small.
		// A distance between items in the container should not be very large.
		// By assuming that it is within around 1/8 of the address space,
		// we can help the analyzer perform operations on these symbols
		// without being afraid of integer overflows.
		// FIXME: Should we provide it as an API, so that all checkers could use it?
		ProgramStateRef assumeNoOverflow(ProgramStateRef State, SymbolRef Sym,
		long Scale) {
		SValBuilder &SVB = State->getStateManager().getSValBuilder();
		BasicValueFactory &BV = SVB.getBasicValueFactory();

		QualType T = Sym->getType();
		assert(T->isSignedIntegerOrEnumerationType());
		APSIntType AT = BV.getAPSIntType(T);

		ProgramStateRef NewState = State;

		llvm::APSInt Max = AT.getMaxValue() / AT.getValue(Scale);
		SVal IsCappedFromAbove =
		SVB.evalBinOpNN(State, BO_LE, nonloc::SymbolVal(Sym),
		nonloc::ConcreteInt(Max), SVB.getConditionType());
		if (auto DV = IsCappedFromAbove.getAs<DefinedSVal>()) {
		NewState = NewState->assume(*DV, true);
		if (!NewState)
		return State;
		}

		llvm::APSInt Min = -Max;
		SVal IsCappedFromBelow =
		SVB.evalBinOpNN(State, BO_GE, nonloc::SymbolVal(Sym),
		nonloc::ConcreteInt(Min), SVB.getConditionType());
		if (auto DV = IsCappedFromBelow.getAs<DefinedSVal>()) {
		NewState = NewState->assume(*DV, true);
		if (!NewState)
		return State;
		}

		return NewState;
		}

template <typename Condition, typename Process>		template <typename Condition, typename Process>
ProgramStateRef processIteratorPositions(ProgramStateRef State, Condition Cond,		ProgramStateRef processIteratorPositions(ProgramStateRef State, Condition Cond,
Process Proc) {		Process Proc) {
auto &RegionMapFactory = State->get_context<IteratorRegionMap>();		auto &RegionMapFactory = State->get_context<IteratorRegionMap>();
auto RegionMap = State->get<IteratorRegionMap>();		auto RegionMap = State->get<IteratorRegionMap>();
bool Changed = false;		bool Changed = false;
for (const auto Reg : RegionMap) {		for (const auto Reg : RegionMap) {
if (Cond(Reg.second)) {		if (Cond(Reg.second)) {
▲ Show 20 Lines • Show All 248 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer] Instead of recording comparisons in interator checkers do an eager state splitClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 196190

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp

[Analyzer] Instead of recording comparisons in interator checkers do an eager state split
ClosedPublic