This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
lib/StaticAnalyzer/Checkers/
-
StaticAnalyzer/
-
Checkers/
1
IteratorChecker.cpp

Differential D48427

[Analyzer] Iterator Checker Hotfix: Defer deletion of container data until its last iterator is cleaned up
ClosedPublic

Authored by baloghadamsoftware on Jun 21 2018, 6:21 AM.

Download Raw Diff

Details

Reviewers

NoQ
rnkovacs
george.karpenkov

Commits

rGa692120cb76b: [Analyzer] Iterator Checker Hotfix: Defer deletion of container data until its…
rL338234: [Analyzer] Iterator Checker Hotfix: Defer deletion of container data until its…
rC338234: [Analyzer] Iterator Checker Hotfix: Defer deletion of container data until its…

Summary

D47417 is a fix for an accidentally missing transition. However, if we apply that fix, the checker will remove data from the GDM which is still needed. In this fix we defer its removing while there are iterators using it.

Diff Detail

Event Timeline

baloghadamsoftware created this revision.Jun 21 2018, 6:21 AM

Herald added a reviewer: george.karpenkov. · View Herald TranscriptJun 21 2018, 6:21 AM

Herald added subscribers: mikhail.ramalho, a.sidorin, dkrupp and 3 others. · View Herald Transcript

baloghadamsoftware added a parent revision: D47417: [analyzer] Add missing state transition in IteratorChecker.Jun 21 2018, 6:22 AM

george.karpenkov resigned from this revision.Jun 26 2018, 5:52 PM

Aha, ok, yeah, we should have seen this coming. Whenever a checker tracks pairs of objects, like containers and iterators, or strings objects and their internal buffers (D48522), or return values and out-parameters of the same function call (D32449), we should expect races on which of the two dies first. Such races are inevitable because any of the two may be arbitrarily stored for an indefinite amount of time. The test that you see failing is one of the tricky cases to understand because it's about liveness of a parameter region, which is a bit counter-intuitive.

I believe that the checker should deal with symbol/region lifetime races, not try to prevent them. The checker should only extend the lifetime of items it tracks in checkLiveSymbols() when the checker has additional information about how the programmer can access the tracked object despite not having a direct reference to it in the imperfect symbolic memory. Otherwise we're sacrificing our fairly precise liveness tracking, which serves two important purposes:

finding leaks (eg., if the container is allocated on the heap, we may get a false negative leak when an iterator suddenly starts keeping it alive) and

keeping states small for faster lookups (even though state cleanups is a top performance bottleneck that occupies 30-50% of the analyzer's performance, removing cleanups only makes things worse because state lookups become slower).

When solving liveness races, the key understanding is that when the object becomes dead, all information we gathered during analysis about the object is final. Such information will never become more precise, and it will never mutate. For instance, because the object cannot be accessed anymore during the current analysis, its begin() and end() locations will never change. The "during the current analysis" part is important: for example, in your test simple_bad_end(), the vector is marked as dead even though the previous stack frame definitely still has access to it, because the analysis is only conducted within the current stack frame and will never leave it. If we started our analysis from the caller, the vector would not have been marked as dead yet.

Therefore i believe that when the container goes out of scope, the iterator should be "disconnected" from the container, and any information we've had about the container should now be associated with the iterator itself. In your case it probably means connecting the iterator directly to the begin/end symbols. That puts a bit of stress into your program state trait structure because there are now two sorts of iterator info structures to handle (one that directs all queries to the container, another that has all the info within it directly). But that's the usual cost of handling a liveness race - cf. "Schrödinger mutex states" in D32449, which are only as different from your case as the nature of the objects you're tracking is different.

I'm slightly worried that it might potentially be possible to mutate the container significantly by only using an interator, without having access to the object itself. It doesn't seem possible - at most you'll be able to mutate the element to which the iterator points, but you can't eg. change the size of the container or invalidate other iterators. If you can in fact mutate the container significantly through an iterator, then we'll need to re-think this, and enforcing liveness may be the right thing to do (though we may also get away with simply keeping a dead region in the program state solely for convenience).

george.karpenkov added a subscriber: george.karpenkov.Jun 27 2018, 1:22 PM

Hmm, then the only solution that comes to my mind is to link iterator positions to container data instead of the container regions. I also have to store a link to the class definition of the container in the container data for the invalidation rules. Container data can then be detached from the container region until the last iterator to the container is clean up. Is it OK?

That'd be a hell for you because when the container is updated you won't be able to easily find all iterators that iterate over it. Normally what you want to do is keep mapping iterators to container regions, and when the region dies, "freeze" the data (make sure it can no longer be mutated, through an assertion or, ideally, via the type system) and re-map the iterator to data directly.

I guess you need to track dead containers somehow anyway because you want to find bugs that consist in using iterators from different containers:

void foo(vector<int> x, vector<int> y) {
  vector::iterator i = x.begin(), e = y.end();   // Both containers are garbage-collected after this statement.
  vector::iterator k = find(i, j, 123);          // Bug: 'i' and 'j' aren't from the same container.
}

For that purpose i'm fine with tracking dead regions (as long as they aren't marked live, just stored). Or, for enforcing correctness, you could "anonymize" them (i.e., in the container data add an immutable int field that says "this is container #5" and use these ids for your check, then recover the original container region from the identifier in the bug visitor); but i don't think it's necessary.

Instead of marking the container alive, now we defer deletion of the container data until all its iterators are cleaned up.

Adding of transition removed since it is part of D47417.

baloghadamsoftware removed a parent revision: D47417: [analyzer] Add missing state transition in IteratorChecker.Jul 4 2018, 2:02 AM

baloghadamsoftware added a child revision: D47417: [analyzer] Add missing state transition in IteratorChecker.

baloghadamsoftware retitled this revision from [Analyzer] Fix for D47417 to make the tests pass to [Analyzer] Defer deletion of container data until its last iterator is cleaned up.Jul 6 2018, 1:44 AM

baloghadamsoftware edited the summary of this revision. (Show Details)

baloghadamsoftware retitled this revision from [Analyzer] Defer deletion of container data until its last iterator is cleaned up to [Analyzer] Iterator Checker Hotfix: Defer deletion of container data until its last iterator is cleaned up.

Yep, that'll do, thanks! Sorry for not keeping up with all the incoming reviews.

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp
536–538	Just let's explain that, because it's something very unobvious about the code. Like, anybody who reads that is allowed to ask "omg why are they doing it??", and it's a good indication that we need a comment. We keep container information around because the iterator's container identity is anyway essential to understanding the iterator, and that causes us to keep a dead container around in the iterator map values, and we might as well keep it in the container map keys to avoid changing how we represent the iterator for no good reason.

This revision is now accepted and ready to land.Jul 25 2018, 4:42 PM

Closed by commit rC338234: [Analyzer] Iterator Checker Hotfix: Defer deletion of container data until its… (authored by baloghadamsoftware). · Explain WhyJul 30 2018, 1:52 AM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

lib/

StaticAnalyzer/

Checkers/

IteratorChecker.cpp

23 lines

Diff 153907

lib/StaticAnalyzer/Checkers/IteratorChecker.cpp

Show First 20 Lines • Show All 285 Lines • ▼ Show 20 Lines
ProgramStateRef relateIteratorPositions(ProgramStateRef State,		ProgramStateRef relateIteratorPositions(ProgramStateRef State,
const IteratorPosition &Pos1,		const IteratorPosition &Pos1,
const IteratorPosition &Pos2,		const IteratorPosition &Pos2,
bool Equal);		bool Equal);
const ContainerData *getContainerData(ProgramStateRef State,		const ContainerData *getContainerData(ProgramStateRef State,
const MemRegion *Cont);		const MemRegion *Cont);
ProgramStateRef setContainerData(ProgramStateRef State, const MemRegion *Cont,		ProgramStateRef setContainerData(ProgramStateRef State, const MemRegion *Cont,
const ContainerData &CData);		const ContainerData &CData);
		bool hasLiveIterators(ProgramStateRef State, const MemRegion *Cont);
bool isOutOfRange(ProgramStateRef State, const IteratorPosition &Pos);		bool isOutOfRange(ProgramStateRef State, const IteratorPosition &Pos);
bool isZero(ProgramStateRef State, const NonLoc &Val);		bool isZero(ProgramStateRef State, const NonLoc &Val);
} // namespace		} // namespace

IteratorChecker::IteratorChecker() {		IteratorChecker::IteratorChecker() {
OutOfRangeBugType.reset(		OutOfRangeBugType.reset(
new BugType(this, "Iterator out of range", "Misuse of STL APIs"));		new BugType(this, "Iterator out of range", "Misuse of STL APIs"));
OutOfRangeBugType->setSuppressOnSink(true);		OutOfRangeBugType->setSuppressOnSink(true);
▲ Show 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	for (const auto Sym : SymbolMap) {
if (!SR.isLive(Sym.first)) {		if (!SR.isLive(Sym.first)) {
State = State->remove<IteratorSymbolMap>(Sym.first);		State = State->remove<IteratorSymbolMap>(Sym.first);
}		}
}		}

auto ContMap = State->get<ContainerMap>();		auto ContMap = State->get<ContainerMap>();
for (const auto Cont : ContMap) {		for (const auto Cont : ContMap) {
if (!SR.isLiveRegion(Cont.first)) {		if (!SR.isLiveRegion(Cont.first)) {
		if (!hasLiveIterators(State, Cont.first)) {
State = State->remove<ContainerMap>(Cont.first);		State = State->remove<ContainerMap>(Cont.first);
}		}
		NoQUnsubmitted Not Done Reply Inline Actions Just let's explain that, because it's something very unobvious about the code. Like, anybody who reads that is allowed to ask "omg why are they doing it??", and it's a good indication that we need a comment. We keep container information around because the iterator's container identity is anyway essential to understanding the iterator, and that causes us to keep a dead container around in the iterator map values, and we might as well keep it in the container map keys to avoid changing how we represent the iterator for no good reason. NoQ: Just let's explain that, because it's something very unobvious about the code. Like, anybody…
}		}
		}

auto ComparisonMap = State->get<IteratorComparisonMap>();		auto ComparisonMap = State->get<IteratorComparisonMap>();
for (const auto Comp : ComparisonMap) {		for (const auto Comp : ComparisonMap) {
if (!SR.isLive(Comp.first)) {		if (!SR.isLive(Comp.first)) {
State = State->remove<IteratorComparisonMap>(Comp.first);		State = State->remove<IteratorComparisonMap>(Comp.first);
}		}
}		}

		C.addTransition(State);
}		}

ProgramStateRef IteratorChecker::evalAssume(ProgramStateRef State, SVal Cond,		ProgramStateRef IteratorChecker::evalAssume(ProgramStateRef State, SVal Cond,
bool Assumption) const {		bool Assumption) const {
// Load recorded comparison and transfer iterator state between sides		// Load recorded comparison and transfer iterator state between sides
// according to comparison operator and assumption		// according to comparison operator and assumption
const auto *SE = Cond.getAsSymExpr();		const auto *SE = Cond.getAsSymExpr();
if (!SE)		if (!SE)
▲ Show 20 Lines • Show All 616 Lines • ▼ Show 20 Lines	if (comparison) {
}		}

return NewState;		return NewState;
}		}

return State;		return State;
}		}

		bool hasLiveIterators(ProgramStateRef State, const MemRegion *Cont) {
		auto RegionMap = State->get<IteratorRegionMap>();
		for (const auto Reg : RegionMap) {
		if (Reg.second.getContainer() == Cont)
		return true;
		}

		auto SymbolMap = State->get<IteratorSymbolMap>();
		for (const auto Sym : SymbolMap) {
		if (Sym.second.getContainer() == Cont)
		return true;
		}

		return false;
		}

bool isZero(ProgramStateRef State, const NonLoc &Val) {		bool isZero(ProgramStateRef State, const NonLoc &Val) {
auto &BVF = State->getBasicVals();		auto &BVF = State->getBasicVals();
return compare(State, Val,		return compare(State, Val,
nonloc::ConcreteInt(BVF.getValue(llvm::APSInt::get(0))),		nonloc::ConcreteInt(BVF.getValue(llvm::APSInt::get(0))),
BO_EQ);		BO_EQ);
}		}

bool isOutOfRange(ProgramStateRef State, const IteratorPosition &Pos) {		bool isOutOfRange(ProgramStateRef State, const IteratorPosition &Pos) {
▲ Show 20 Lines • Show All 65 Lines • Show Last 20 Lines