This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Fix for PR15623: eliminate unwanted ProgramState checker data propagation.
ClosedPublic

Authored by ayartsev on Jul 27 2016, 6:43 AM.

Details

Summary

The attached patch eliminates unneeded checker data propagation from one of the operands of a logical operation to the operation result. The result of a logical operation is calculated from the logical values of its operands and is independent from operands nature.

One of the test changed its result (misc-ps-region-store.m, rdar_7275774). I did not manage to understand the test, something is definitely wrong with it - at least the comment inside the test do not correspond to reality and an old test result seem to be wrong.

The patch fixes https://llvm.org/bugs/show_bug.cgi?id=15623.

Please review!

Diff Detail

Event Timeline

ayartsev updated this revision to Diff 65716.Jul 27 2016, 6:43 AM
ayartsev retitled this revision from to [analyzer] Fix for PR15623: eliminate unwanted ProgramState checker data propagation..
ayartsev updated this object.
ayartsev added reviewers: zaks.anna, krememek.
ayartsev added a subscriber: cfe-commits.
NoQ added a subscriber: NoQ.Jul 27 2016, 11:05 AM
zaks.anna edited edge metadata.Jul 27 2016, 10:13 PM

I am not sure it's the right way of fixing this problem and it introduces a regression. The bug is probably specific to "&&".

+ Devin as we might have seen something similar.

test/Analysis/misc-ps-region-store.m
332 ↗(On Diff #65716)

Try substituting 'p' with null and you will se that n must be zero in that case because, otherwise, we would take the early return branch. Since p is not null, we should not warn here.

This is a regression.

zaks.anna edited reviewers, added: dcoughlin; removed: krememek.Jul 27 2016, 10:13 PM
ayartsev added inline comments.Jul 28 2016, 12:11 AM
test/Analysis/misc-ps-region-store.m
332 ↗(On Diff #65716)

If we reached the line "unsigned short *p = (unsigned short*) data;" then ''data" is definitely null and "n" is definitely >0, otherwise we would take the early return branch. Then we have "p" is definitely null and "q" is either equal (if n == 1) or greater then "p". In case of n > 1 we definitely have a null dereference. Please tell what I'm missing.

NoQ added inline comments.Jul 28 2016, 1:49 AM
test/Analysis/misc-ps-region-store.m
332 ↗(On Diff #65716)

"data" is definitely null and "n" is definitely >0

"data" is definitely non-null or "n" is definitely =0.

We return on 'not-or', which means we continue on plain 'or'.

I also agree that the easiest way to understand that is to substitute data with null and see what happens.

zaks.anna requested changes to this revision.Jul 29 2016, 9:52 AM
zaks.anna edited edge metadata.
This revision now requires changes to proceed.Jul 29 2016, 9:52 AM
dcoughlin edited edge metadata.Jul 29 2016, 5:37 PM

As PR15623 shows, returning the existing cast is not correct. But rather than replace it with an unknown, here is a proposal for how to address this without regressing in precision.

Instead of using assumeDual() in ExprEngine::VisitLogicalExpr() on the RHSVal and returning 0, 1, or Unknown() depending on whether the true and false states are empty, we can return a symbolic expression representing "RHSVal != 0":

nonloc::ConcreteInt Zero(getBasicVals().getValue(0, B->getType()));
X = evalBinOp(N->getState(), BO_NE, getSValBuilder().evalCast(RHSVal, B->getType(), RHS->getType()), Zero, B->getType());

Now, evalBinOp() is ultimately doing its own assumeDual() under the hood, so at a high level this should do the same thing as we had before for the 0 and 1 cases -- and it will return a new symbol rather than the casted RHSVal (which is what causing the false positive leaks) when the assumption is unknown.

Unfortunately, this can easily create symbolic expressions of the form "((reg_$1<n>) == 0U) != 0", which the analyzer currently doesn't handle very well. Right now when assumed, that symbolic expression gets translated into a range constraint saying the symbol (reg_$1<n>) == 0U) is in [1, UINTMAX] rather than a range constraint saying that reg_$1<n> is in [0,0] as we might expect. This means that we lose precision in cases like:

if ((n == 0) != 0) {
  clang_analyzer_eval(n == 0); // This should be TRUE but we really currently get UNKNOWN
}

So my proposal is to teach SimpleConstraintManager::assumeSymRel() to simplify an assume of a constraint of the form: "(exp comparison_op expr) != 0" to true into an assume of "exp comparison_op expr" to true. (And similarly, an assume of the form "(exp comparison_op expr) == 0" to true as an assume of exp comparison_op expr to false.) This should be quite a small change.

This approach would improve precision overall, simplify ExprEngine::VisitLogicalExpr(), and avoid the leak false positive from PR15623 without causing the regression in misc-ps-region-store.m that Anna is worried about.

Does this seem reasonable?

NoQ added a comment.Aug 8 2016, 8:49 AM

Hmm. The test in unwanted-programstate-data-propagation.c passes on my machine even without the patch, and the return value on the respective path is correctly represented as (conj_$6{void *}) != 0U, which comes from the evalCast() call in VisitLogicalExpr() and is the default behavior of evalCast() for Loc to pointer casts. There seems to be something amiss.

@zaks.anna, sorry for the noise about the "misc-ps-region-store.m" test, my mistake.

In D22862#508674, @NoQ wrote:

Hmm. The test in unwanted-programstate-data-propagation.c passes on my machine even without the patch, and the return value on the respective path is correctly represented as (conj_$6{void *}) != 0U, which comes from the evalCast() call in VisitLogicalExpr() and is the default behavior of evalCast() for Loc to pointer casts. There seems to be something amiss.

Hm, updated to trunk, now the test passes without the patch. Changing "_Bool" to "int" in the test reproduces the issue.

Does this seem reasonable?

Thanks for the idea, working on the solution.

@dcoughlin, @NoQ, could you, please, tell, how you get dumps of symbolic expressions and constraints like "(conj_$6{void *}) != 0U"? Tried different debug.* checkers and clang_analyzer_explain() but failed.

NoQ added a comment.Aug 15 2016, 2:37 AM

@dcoughlin, @NoQ, could you, please, tell, how you get dumps of symbolic expressions and constraints like "(conj_$6{void *}) != 0U"? Tried different debug.* checkers and clang_analyzer_explain() but failed.

That's SVal.dump(), SymbolRef->dump(), MemRegion.dump() (you can also push those directly into llvm::errs()), ProgramStateRef->dump(), and ultimately ExprEngine.ViewExplodedGraph() (the last one can be activated in run-time with debug.ViewExplodedGraph checker or with -analyzer-viz-egraph-graphviz which was the one i used, see also http://clang-analyzer.llvm.org/checker_dev_manual.html#commands In fact clang-analyzer-explain() was an attempt to make these dumps a bit more understandable.

Hm, updated to trunk, now the test passes without the patch. Changing "_Bool" to "int" in the test reproduces the issue.

Aha, i see! The cast to int gets represented as a nonloc::LocAsInteger value:

&element{SymRegion{conj_$6{void *}},0 S64b,char} [as 32 bit integer]

which is, of course, incorrect, and Devin's fix makes perfect sense here in my opinion :)

ayartsev updated this revision to Diff 78810.Nov 21 2016, 5:43 PM
ayartsev edited edge metadata.

The updated patch implements Devin's solution. Please review.

dcoughlin accepted this revision.Nov 30 2016, 5:19 PM
dcoughlin edited edge metadata.

Overall, this looks good to me. Thanks for tackling this!

One request, though. Could you move the tests into already existing test files? We're trying to avoid test files that only test a single issue. This makes it easier for people who are new to the project to determine where a new test should go. I think test1 could go in perhaps malloc.c and test2 could go in misc-ps.c. It would also be good to have both test functions mention the PR so a future maintainer can tell they are related even though they are in different files.

zaks.anna accepted this revision.Mar 8 2017, 1:40 PM

Thi has been committed in r290505.

This revision is now accepted and ready to land.Mar 8 2017, 1:40 PM
zaks.anna closed this revision.Mar 8 2017, 1:40 PM