This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix the "Zombie symbols" issue.
ClosedPublic

Authored by NoQ on Apr 7 2016, 7:12 AM.

Details

Summary

This patch fixes problems in multiple leak-like static analyzer checkers, which failed to mark symbols managed by them as possibly-dead, and due to that the leak was not detected even though the symbol was removed from the rest of the program state.

Original discussion in cfe-dev started here: http://lists.llvm.org/pipermail/cfe-dev/2016-March/047922.html

The patch implements the solution proposed by Ted Kremenek and described in cfe-dev here: http://lists.llvm.org/pipermail/cfe-dev/2016-March/048215.html

The patch eliminates the "dead set" from the SymbolReaper as a whole, because we now assume that the set of dead symbols is potentially infinite. Hence,dead_iterator, dead_begin(), dead_end() are removed. The maybeDead() API, which was used for marking symbols dead, is removed in order to avoid confusion. The hasDeadSymbols() function is removed as well. The isDead() is modified to mean !isLive().

In case API break is undesired, there's no problem bringing maybeDead() back as a synonym to isDead(), and define hasDeadSymbols() to always return true.

The patch passes the new test for SimpleStreamChecker described in the mailing lists above. It also fixes a FIXME test in unions.cpp, and throws a new warning in pr22954.c.

The new warning in pr22954.c seems correct with the current ideology, however we should probably eventually fix it. My reasoning for this is as follows. The pointer gets lost upon invalidation of m28[j].s4 after binding to l28->s1[l] which in fact aliases to m28[1].s3[l]. According to docs/analyzer/RegionStore.txt, such invalidation is the correct behavior: it is possible that j is equal to 1, and l is out of bounds and the binding actually overwrites one of the bytes in m28[j].s4. So it is the valid behavior of the analyzer to scrap all the bindings to the whole m28 variable region after this bind, which is being tested. However, because it is not *certain* that our pointer gets overwritten, we are not conservative enough when we are suspecting that our pointer was overwritten, so the report is not quite valid, and i may easily imagine false positives (eg. l when is constrained to [0, 3] on that path, we won't be able to handle that). So i should probably mark this as FIXME. However, i don't think this issue is the problem of the current patch; it should be solved separately, either by creating a special kind of PointerEscape to catch such cases, or improving invalidation accuracy by supporting offset constraints (or probably both).

There are a couple of shameful hotfixes in my code in the ExprInspection checker - included here because a problem was exposed by this patch in the symbol-reaper.c test, which now tests ExprInspection deeper.

I avoided fixing some FIXMEs in StreamChecker just around the new code, saving for another patch.

The patch was tested on Android open-source platform source code. I didn't test the patch on any large objective-c codebases, even though the patch alters some objective-c checkers. Performance has slightly degraded (~5%) - hmm, i guess the original approach is very performance-oriented. Also, it has found exactly one(!) new positive, which i'm attaching:

. Not sure if it is a true bug (depends on external factors about the program which the analyzer could not have guessed anyway), but it is expected from the analyzer to find this positive, so i'm happy with this positive. Still funny to see how issues that seem major in description may actually be very minor when it comes to statistical quality.

So, overally, the patch fixes all the issues and simplifies checker API, but i'm not exactly sure we should go this way; probably going directly for smart data map traits would retain the performance while fixing the issue equally well. Probably we can revert this patch when we allow the core to introspect GDM contents and mark symbols dead automatically.

Diff Detail

Repository
rC Clang

Event Timeline

NoQ updated this revision to Diff 52895.Apr 7 2016, 7:12 AM
NoQ retitled this revision from to [analyzer] Fix the "Zombie symbols" issue..
NoQ updated this object.
NoQ edited subscribers, added: cfe-commits, dergachev.a; removed: tberghammer, danalbert.
NoQ updated this object.Apr 7 2016, 7:20 AM
zaks.anna edited edge metadata.Apr 23 2016, 3:51 PM

Thanks for working on this!

The main unfinished task here is to investigate ways of reducing the performance hit. See more info below.

The patch was tested on Android open-source platform source code.

Just to double check, have you compare the pre and after results and all of them but this one issue are the same?

Performance has slightly degraded (~5%) - hmm

5% is a considerable regression :(

lib/StaticAnalyzer/Core/Environment.cpp
200

We are removing this because the maybeDead is no longer used, correct?

lib/StaticAnalyzer/Core/ExprEngine.cpp
685

This removes an optimization to address the following issue:
"removeDeadBindings is not run right after the last reference to the object is lost, which leads to imprecise error reports and failure to report the leak in some cases. It's because of how hasDeadSymbols() is implemented. That problem is solved if we disable the optimization, but I do not know how that effects performance. Maybe we can come up with something more clever.
"
I suspect the removal of this optimization causes the performance regression. In the patch I sent to the list, this was just a hack to demonstrate what causes the issue. I am not sure we should not just remove the optimization... The best proposal I have is to trigger remove dead bindings at the end of every basic block. This would degrade diagnostics (you will see leaks only at the end of the basic block), but should give us performance back or even improve performance.

Artem and Devin, WDYT?

Artem, can you experiment with this and investigate if the diagnostics become much worse? Maybe send a couple of examples? I suggest we implement this mode behind a flag as a separate patch.

lib/StaticAnalyzer/Core/RegionStore.cpp
2591

Looks like we are saying that we should no longer add to maybeDead because it's not used.

What is the status of this patch? Is there a consensus?

MTC added a subscriber: MTC.Mar 27 2018, 10:18 PM

@NoQ can we get the evaluation re-done?

george.karpenkov requested changes to this revision.May 30 2018, 4:58 PM

The change LGTM after nits and rebasing IF the new evaluation will not show a too large regression.

lib/StaticAnalyzer/Checkers/RetainCountChecker.cpp
3925 ↗(On Diff #52895)

for .. in loop?

lib/StaticAnalyzer/Checkers/StreamChecker.cpp
397

foreach loop?

This revision now requires changes to proceed.May 30 2018, 4:58 PM
NoQ updated this revision to Diff 171006.Oct 24 2018, 3:46 PM
NoQ marked 2 inline comments as done.

Rebase!

Changes on large codebase runs still look mild and reasonable, with much less slowdown than before. Even if there's in fact a 5-10% slowdown, i believe we can live with that, as there doesn't seem to be a better solution for the purposes of correctness.

The most important consequence of the rebase was the conflict that lead to test failure in MacOSKeychainAPIChecker on the radar_19196494() test. The story here is that D28330 unconsciously adds a suppression that relies on zombie symbols. I replaced this suppression with another ugly suppression that should be at least a tiny bit in the right direction.

NoQ added inline comments.Oct 24 2018, 3:46 PM
lib/StaticAnalyzer/Core/Environment.cpp
200

Yup.

lib/StaticAnalyzer/Core/ExprEngine.cpp
685

This cannot be kept around because .hasDeadSymbols() cannot be implemented correctly.

lib/StaticAnalyzer/Core/RegionStore.cpp
2591

Yup.

george.karpenkov accepted this revision.Oct 24 2018, 3:50 PM
george.karpenkov added inline comments.
test/Analysis/self-assign.cpp
42

Woo-hoo!

test/Analysis/simple-stream-checks.c
96

Woo-hoo, were we losing this case before?

This revision is now accepted and ready to land.Oct 24 2018, 3:50 PM
NoQ added inline comments.Oct 24 2018, 3:56 PM
test/Analysis/simple-stream-checks.c
96

Yes, this is the whole point of the patch.

NoQ updated this revision to Diff 172263.Nov 1 2018, 4:32 PM

Ha! Apart from Zombie symbols that are neither dead nor alive and therefore not buried properly, there are also Schrödinger symbols that are dead and alive at the same time. Moreover, asking whether a Schrödinger symbol is isLive() or maybeDead() would immediately collapse it into an alive-only state. Moreover, if a checker iterates through the dead set and asks whether a Schrödinger symbol is alive, undefined behavior occurs. Ah, thin ice.

As far as i understand, this only happens to derived symbols under fairly specific circumstances. Namely, if the derived symbol is marked as dead and then its parent symbol is marked as live, the derived symbol is not automatically removed from the "dead set". It is impossible to automatically add derived symbols to the dead set because there may be infinitely many symbols derived from the same parent symbol. Therefore there exists moment of time when the derived symbol is still in the dead set, but asking whether it is isLive() would yield true (because it'll check whether the parent symbol is alive). Additionally, when isLive() notices that it's alive, it automatically removes the symbol from the dead set in order to "memoize" the answer, so it becomes purely alive. In particular, it invalidates dead set iterators, which results in undefined behavior (which in practice boils down to past-end iterator access, ultimately crashing Clang).

Schrödinger symbols cause false positive leaks, unlike zombies who only cause false negative leaks. But those false positives are relatively rare because most checkers don't track derived symbols; eg., MallocChecker only tracks pure conjured symbols. RetainCountChecker is affected, as demonstrated by the newly added test. Not sure if it's possible to make CStringChecker track a derived symbol and forget a string length due to that effect. MacOSKeychainAPI checker should also be affected, i think, but i didn't try. NullabilityChecker is probably unaffected; its behavior on dead symbols is pretty weird, but it seems safe; it also has a separate bug that was exposed but i'll address in a separate patch.

NoQ added a comment.Nov 1 2018, 4:33 PM

Just to be clear - this newly found problem is also automatically fixed by that patch.

Szelethus set the repository for this revision to rC Clang.Nov 5 2018, 6:21 AM

I read through your patch multiple times, have read your mail on cfe-dev, and I still couldn't find eve minor nits, well done!

However, both in the mail, and in this discussion, there are a lot of invaluable information about the inner workings of the analyzer, that I fear will be lost once this revision is closed, even though I don't think the main logic will change in the near future. It would be neat to document it for the next generation somewhere.

Another thing, since we're directly testing the functionality of SymbolReaper, maybe it'd be worth to also include unit tests.

I'll probably revisit this revision as I read through your followup patches, but I can't seem to find errors in the main logic straight away. GG!

NoQ updated this revision to Diff 173487.Nov 9 2018, 6:54 PM

Add an interesting test for the MisusedMovedObject checker that demonstrates one more potential source of false positives caused by the zombie symbol problem. In this test there are, well, no symbols. Therefore, there are no dead symbols or zombie symbols. Therefore SymReaper.hasDeadSymbols() is always false. Therefore checkDeadSymbols() is never called at all. However, MisusedMovedObject checker is not interested in symbols; it is only interested in regions, including concrete regions that aren't based on symbols. So it was missing the checkDeadSymbols() callback that would have unmarked the region for variable e (in inlined function or not in inlined function - doesn't matter). And next time it sees variable e in that function within the same stack frame, it thinks it's the same variable that has just been moved.

This problem was already discussed in D24246?id=82469#inline-249803.

Add tests in loop-block-counts.c that demonstrate the other source of the problem in MisusedMovedObject: in fact, variable e should not be the same variable on different iterations of the loop. In case of the inlined function, the problem is caused by how our StackFrameContext doesn't contain "block count" for the entrance - which is a hack to discriminate between different iterations of the loop that is used for, eg., conjured symbols, but, unfortunately, not for addresses of variables / temporaries. In case of non-inlined functions, the problem is deeper: we simply don't have a LocationContext for a single loop iteration, so there's no way we can discriminate between loop locals on different loop iterations by their memory spaces.

NoQ added a comment.Nov 9 2018, 7:00 PM

The manual region cleanup is removed from MisusedMovedObjectChecker in D54372.

Another thing, since we're directly testing the functionality of SymbolReaper, maybe it'd be worth to also include unit tests.

Sounds like a great idea, and/or i should add ExprInspection-based tests into test/Analysis/symbol-reaper.c.

NoQ updated this revision to Diff 173932.Nov 13 2018, 1:34 PM
In D18860#1284805, @NoQ wrote:

RetainCountChecker is affected, as demonstrated by the newly added test.

RetainCountChecker is affected due to temporary insanity in the checker that was induced by trusting summaries of inlined calls, which was reverted in D53902. I'll keep the FP/TN test, but i'll need to come up with a new test in order to test whatever i wanted to test, which will most likely be a unit test.

NoQ added a comment.Nov 13 2018, 3:14 PM

Nope, unit tests aren't quite useful here. I can't unit-test the behavior of API that i'm about to remove... right?

In D18860#1297742, @NoQ wrote:

Nope, unit tests aren't quite useful here. I can't unit-test the behavior of API that i'm about to remove... right?

Hmmm, can't really argue with that, can I? :D

Let's also have the link to your most recent mail about this issue here: http://lists.llvm.org/pipermail/cfe-dev/2018-November/060038.html

I re-read the mailing list conversation from 2016, @szepet's MisusedMoveChecker patch, so I have a better graps on whats happening.

I think this really is non-trivial stuff. When I initially read your workbook, SymbolReaper (along with what inlining really is) was among the biggest unsolved mysteries for me. The explanation of it you wrote back in 2016[1] is awesome, I feel enlightened after reading it, it's very well constructed, but I find it unfortunate that it took me almost a year of working in the analyzer to find it. After looking at SymbolManager.cpp's history, I can see that the logic didn't change since, I'd very strongly prefer to have this goldnugget copy-pasted even to a simple txt file, and have it commited with this patch.

[1] http://lists.llvm.org/pipermail/cfe-dev/2016-March/048142.html

In D18860#1297742, @NoQ wrote:

Nope, unit tests aren't quite useful here. I can't unit-test the behavior of API that i'm about to remove... right?

Hmmm, can't really argue with that, can I? :D

Well, we probably should implement unit tests for these anyways, SymbolReaper is not only performance-critical, but is also among the few data structures that's an essential part of the analysis, and it's change in functionality could be very easily shown in dedicated test files. But that's an issue for another time, and until then, debug.ExprInspection still does the job well enough. Let's not delay this patch longer just because of that, and enjoy the post zombie apocalypse analyzer.

NoQ added a comment.Nov 29 2018, 6:52 PM

I'd very strongly prefer to have this goldnugget copy-pasted even to a simple txt file, and have it commited with this patch.

I guess i'll do it a bit later because it needs to be cleaned up after the behavior changes with this patch. There are a few other such discussions that i wanted to add to our docs.

This revision was automatically updated to reflect the committed changes.