This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] [draft] Don't leave results of comparisons unknown
Needs ReviewPublic

Authored by a.sidorin on Sep 9 2016, 5:39 AM.

Details

Summary

This is a draft patch for a bug reported by Daniel Marjamäki. For now, I just need to ensure I'm going the right direction.
Test case (I'll add it to the patch later):

// RUN: %clang_cc1 -analyze -analyzer-checker=debug.ExprInspection -verify %s

void clang_analyzer_warnIfReached();
extern int Min, Max;

void testReachabilityFail(int b) {
  if (!(b >= Min && b < Max)) {
    return;
  } else {
    clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
    b++;
  }
}

A quote from mailing list explaining a problem:

There is my understanding of the problem.
We don't handle sym-sym comparisons well. The result of comparison between two symbols is an UnknownVal.
First, analyzer renders "b >= Min" branch and assumes it false. Then, it renders operator '&&', then makes a cleanup which results in a new node with an empty state (1). It proceeds with false branch to operator! and then - to ReturnStmt.

Then, it renders branch where (b >= Min) is true. It assigns another UnknownVal to b < Max, cleanups ... and tries to create another empty-stated node before entering operator!. But there is already a node with an empty state and in the ProgramPoint. So, two execution paths are merged unexpectedly. And the analysis finishes since 'operator!' was rendered already for this merged branch.

I tried a quick solution which is the conjuring a symbolic value for comparisons instead of leaving it Unknown. It seems to work but I'm not sure if it is exactly what we need here. Anna, Artem, what is your opinion? Should we proceed with review or it is a bad idea in general?

Diff Detail

Event Timeline

a.sidorin updated this revision to Diff 70814.Sep 9 2016, 5:39 AM
a.sidorin retitled this revision from to [analyzer] [draft] Don't leave results of comparisons unknown.
a.sidorin updated this object.
a.sidorin added reviewers: NoQ, zaks.anna, dcoughlin.
zaks.anna edited edge metadata.Sep 9 2016, 1:50 PM

Thanks for digging into this! Looks like we are "caching out" when we should not. When analyzer sees the same exact node twice, it should cache out. One use case of that is ensuring we do not infinitely go around a loop.

The node consists of a ProgramPoint (like 'pc' or a location) and ProgramState. From what I can tell, here the root problem is that the program location changed, but it is not represented in the node. (However, in the patch you are trying to fix it by modifying the Program State.) As for the next steps, could you double check that the CFG for this looks correct? What do the ProgramPoints for this code look like?

Please, include context in your patches (http://llvm.org/docs/Phabricator.html).
Thanks!

The ProgramPoint is PreStmtPurgeDeadSymbols. Since they are separated by Stmt and LocationContext only, they really seem to be same for both branches.

NoQ edited edge metadata.Sep 13 2016, 3:56 AM

While i agree that a conjured symbol is better than an unknown value, i still suspect a bigger problem here.

The question is, if the states are equal, where does the environment entry b >= Min && b < Max: 0 S32b come from on the original branch? It sounds to me as if we construct multiple program states while keeping "in mind" (as opposed to "in our nodes") that we need to go to the true branch after the terminator. If my guess is correct, we should bind the environment entry earlier and free our "mind" of the unnecessary burden, before we merge upon reaching PreStmtPurgeDeadSymbols.

The subtle-est bug i've seen so far, i think.