This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Prohibit ExplodedGraph's edges duplicating
AbandonedPublic

Authored by ilya-palachev on Dec 13 2016, 5:54 AM.

Details

Summary

Current implementation doesn't take care about the duplicates in edges
of ExplodedGraph. This actually happens when, for example, the checker
tries to add transition to node, and gets `nullptr', which means that
the node already exists and has been processed. In this case edge
between the node and its predecessor is duplicated.

This patch prohibits this situation, by checking whether the actual
group already contains the given predecessor.

Diff Detail

Repository
rL LLVM

Event Timeline

ilya-palachev retitled this revision from to [analyzer] Prohibit ExplodedGraph's edges duplicating.
ilya-palachev updated this object.
ilya-palachev added reviewers: NoQ, zaks.anna, dcoughlin.
ilya-palachev set the repository for this revision to rL LLVM.
ilya-palachev removed rL LLVM as the repository for this revision.

Fixed a typo

ilya-palachev set the repository for this revision to rL LLVM.Dec 13 2016, 6:23 AM
xazax.hun added inline comments.
lib/StaticAnalyzer/Core/CoreEngine.cpp
662

I prefer having the braces written in this case, but it is a minor nit.

Wouldn't using an algorithm like std::find work as well?

zaks.anna edited edge metadata.Dec 20 2016, 12:07 PM

Are there any negative effects from having the duplicates in edges?

When does this occur? It's a bit suspicious since we have a FromN, and a path is split at that node, resulting in successors that are the same. Is it possible that whoever split the state did not encode enough interesting info?

ilya-palachev added a comment.EditedDec 21 2016, 8:13 AM

Are there any negative effects from having the duplicates in edges?

Yes, you're right; in current implementation there seems to be no negative effects from this, since duplicate edges are quite rare, and they produce very small overhead on memory (just two additional pointers for each excessive edge).

When does this occur? It's a bit suspicious since we have a FromN, and a path is split at that node, resulting in successors that are the same. Is it possible that whoever split the state did not encode enough interesting info?

Yes, this does occur (for one of our checkers on Android).
This patch is more likely a request for comments for what to do in the situation discussed below...

Consider the checker that can emit warnings on multiple dead symbols (one warning for each buggy symbol). When the checker encounters such situation, it first emits a warning for the 1st symbol, then for the 2nd one. For the 2nd and other symbols the CheckerContext::addTransition returns `nullptr'. That means that the requested node already exists, because the State hasn't changed. And for each warning beginning from the 2nd, the additional edge is added for the ExplodedGraph. It can be even simply seen with DOT graph.

Yes, this problem can be resolved with checker tags for nodes, but in this case we'll need to create new tag for each such warning (because ProgramPoint and ProgramState are equal for them all).

Moreover, such cases can even lead to non-determinism in warnings.

The checker stores a set(map/list...) of buggy symbols (regions...) in GDM. When checkDeadSymbols callback happens, this set is iterated, and the checker tries to emit a warning for each of them. But actually the order in which symbols are stored in the set depends on their order as pointers, and it's controlled by allocator. The allocator contains non-determinism by design, and region for symbol A can be greater (as a pointer) than the region for the symbol B in one analysis run, and smaller during the another run.

That's why in such cases different warnings can be emitted from time to time. The discussed patch doesn't address this issue, but I'd like at least to discuss the situation.

To deal with non-determinism, you can sort the results by non-pointer values, such as identifiers, before producing the warnings.

It is not clear if you want to print all warnings or only the first one. Is it an option to list all dead symbols in one warning message? Do we support attaching multiple bug reports to the same node?

NoQ edited edge metadata.Dec 21 2016, 10:41 PM

I think MallocChecker should be (or should have been) affected by the problem you describe, so we can use it as an example.

Do we support attaching multiple bug reports to the same node?

That's what MallocChecker does, and it works:

void foo() {
  void *x = malloc(1);
  void *y = malloc(1);
  (void)(y - x);  // expected-warning{{'x'}} expected-warning{{'y'}}
}

If this is a common mistake for checker writers, we could consider adding assertions that check for this situation. We should make sure that we do not to add any release builds overhead. Maybe we could put this check behind NDEBUG or ensure that whatever code is added to support the assertion is optimized away.

As Artem points out, the checkers in tree do not create a node for every bug reported - there is no reason to do that.

Thanks for review!

As Artem points out, the checkers in tree do not create a node for every bug reported - there is no reason to do that.

Yes... But why does generateNonFatalErrorNode return nullptr in case when the node already exists? Isn't it designed so that to prevent multiple reports being thrown for the same node? If so, then the code contains a lot of such checks.

I don't understand why the following test passes, because each of two checkers: MallocChecker and SimpleStreamChecker are using generateNonFatalErrorNode method:

// RUN: %clang_cc1 -analyze -analyzer-checker=unix.Malloc,alpha.unix.SimpleStream -analyzer-store region -verify %s

typedef struct _IO_FILE FILE;
extern FILE *fopen(const char *path, const char *mode);
extern int fclose(FILE *fp);

typedef __typeof(sizeof(int)) size_t;
void *malloc(size_t);
void free(void *);

void test() {
  void *p = malloc(1);
  void *r = fopen("foo", "r");
  if (r) {
    (void)(p - r);
    // expected-warning@-1{{Potential leak of memory}}
    // expected-warning@-2{{Opened file is never closed}}
  }
  else {
    fclose(r);
    free(p);
  }
}

Maybe it's possible to invent such a testcase when two checkers begin to conflict for the node. Or not?

NoQ added a comment.Jan 20 2017, 7:07 AM

I don't understand why the following test passes, because each of two checkers: MallocChecker and SimpleStreamChecker are using generateNonFatalErrorNode method:

That's because error nodes generated by different checkers have different program point, because they have different program point tags, because their tag is the checker that throws the report.

I'm referring to this funny code in Checker.h:473:

class CheckerBase : public ProgramPointTag

As Artem points out, the checkers in tree do not create a node for every bug reported - there is no reason to do that.

Yes... But why does generateNonFatalErrorNode return nullptr in case when the node already exists? Isn't it designed so that to prevent multiple reports being thrown for the same node? If so, then the code contains a lot of such checks.

I do not have a certain answer (i.e. bugs me as well), but this behavior allows knowing if this checker has already reported a bug on a different path that leads to the same predecessor node and therefore to the same error node.

So the use-case i'm a bit worried about is when a checker finds the same error node on a different path and suddenly wants to throw a different report (by bug kind). But i think it's normally impossible because the checker is never, in any callback, aware of the path it took before reaching the predecessor node, only the node itself.

I think i lost track of the original question's discussion>< Do we still believe that duplicated edges are a problem?

ilya-palachev abandoned this revision.Feb 9 2017, 7:30 AM

Ok, then, if you see no problem in dual edges, I'm abandoning this revision.