This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][Z3-refutation] Fix a refutation BugReporterVisitor bug
ClosedPublic

Authored by steakhal on Apr 19 2020, 9:53 AM.

Details

Summary

FalsePositiveRefutationBRVisitor had a bug where the constraints were not
properly collected thus crosschecked with Z3.
This patch demonstratest and fixes that bug.

Bug:
The visitor wanted to collect all the constraints on a BugPath.
Since it is a visitor, it stated the visitation of the BugPath with the node
before the ErrorNode. As a final step, it visited the ErrorNode explicitly,
before it processed the collected constraints.

In principle, the ErrorNode should have visited before every other node.
Since the constraints were collected into a map, mapping each symbol to its
RangeSet, if the map already had a mapping with the symbol, then it was skipped.

This behavior was flawed if:
We already had a constraint on a symbol, but az the end in the ErrorNode we have
a stricter constraint on that. Therefore, this visitor would not utilize that
stricter constraint during the crosscheck validation.

Diff Detail

Event Timeline

steakhal created this revision.Apr 19 2020, 9:53 AM

As turned out we don't even need a BugReporterVisitor for doing the crosscheck.
We should simply use the constraints of the ErrorNode since that has all the necessary information.

This should not be true. But we should definitely have a test case that proves this. The main idea is that unused symbols can be garbage collected. The problem is that the ErrorNode does not have any information about the symbols that were garbage collected earlier. This is why we need the visitor.

As turned out we don't even need a BugReporterVisitor for doing the crosscheck.
We should simply use the constraints of the ErrorNode since that has all the necessary information.

This should not be true. But we should definitely have a test case that proves this. The main idea is that unused symbols can be garbage collected. The problem is that the ErrorNode does not have any information about the symbols that were garbage collected earlier. This is why we need the visitor.

You might be right. Could you give a short example to a garbage-collected symbol?
Either way, we need to fix the bug which I stated earlier.

I tried to create a unit-test for the bug, but I miserably failed.
To call the visitor::finalizeVisitor you would need almost the entire world. (A ExprEngine, a CompilerInstance, etc.)
I also considered a lit-test, though I really doubt that is the right tool for testing this.
Since we are constantly improving the checkers, we should not bake in a false-positive bugreport which is then invalidated by the Z3 visitor...

As turned out we don't even need a BugReporterVisitor for doing the crosscheck.
We should simply use the constraints of the ErrorNode since that has all the necessary information.

This should not be true. But we should definitely have a test case that proves this. The main idea is that unused symbols can be garbage collected. The problem is that the ErrorNode does not have any information about the symbols that were garbage collected earlier. This is why we need the visitor.

If a symbol is unused and garbage collected then that is not part of the path constraint that leads to the ErrorNode, is it? So why should we care about constraints on an unused symbol?

xazax.hun added a comment.EditedApr 20 2020, 2:01 AM

If a symbol is unused and garbage collected then that is not part of the path constraint that leads to the ErrorNode, is it? So why should we care about constraints on an unused symbol?

Unused means it is unused at a program point.
For example:

void f(int sym1, int sym2) {
  int sym3 = sym1 * sym2;
  if (!sym1)
    return;
  // Here sym1 is no longer used, it is garbage collected from the state.
  if (sym3) {
    sym2 / 0; // This is a false positive. sym1 is 0 -> sym3 should be 0 as well, this branch is never taken. But to refute this path we need the info about sym1, that is not available in the ErrorNode state. It was garbage collected earlier.
  } 
}

I did not try this specific example. The constraint manager might be smart enough to not to report a false positive, but I hope it makes the idea clear why do we need the garbage collected symbols.

One way to test this change would be to add a statistic that is bumped each time a path is refuted (a report to be refuted we need all of the paths refuted, so using refuted reports might not be fine-grained enough). We can test on some big projects if the refuted paths increase or decrease after the change.

NoQ added a comment.Apr 20 2020, 2:13 AM

Yup, just take a symbol that is completely unrelated to the bug report, make an impossible constraint on it that our normal constraint manager can't refute, forget about it. That'll be the example.

NoQ added a comment.Apr 20 2020, 2:15 AM

If a symbol is unused and garbage collected then that is not part of the path constraint that leads to the ErrorNode, is it?

It is part of the path, it's just no longer accessible in the program, so it can't be observed at the end of the path. But it did play an important role at some point in the path.

NoQ added a comment.Apr 20 2020, 2:26 AM

(sry for the noise, just keep thinking of new things to reply)

To call the visitor::finalizeVisitor you would need almost the entire world. (A ExprEngine, a CompilerInstance, etc.)

We already have unittests that mock these objects, cf. ExprEngineConsumer (feel free to extend as necessary).

steakhal updated this revision to Diff 259532.Apr 23 2020, 5:12 AM
  • keep the visitor
  • fix the bug
  • add test demonstrating the bug and the fix
steakhal marked an inline comment as done.Apr 23 2020, 5:13 AM
steakhal added inline comments.
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2816

How can we simplify this?

xazax.hun added inline comments.Apr 23 2020, 5:56 AM
clang/lib/StaticAnalyzer/Core/BugReporter.cpp
2751

Is this function used anywhere?

martong added inline comments.Apr 23 2020, 7:47 AM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2811–2812

Probably you wanted to use hasContradictionUsingZ3 starting from here. (?)

2816

Could we change the visitation logic to start with the EndPathNode? I mean could we exercise FalsePositiveRefutationBRVisitor to start the visitation from EndPathNode? If that was possible then this whole patch would be way simpler.
In BugReporter.cpp:

// Run visitors on all nodes starting from the node *before* the last one.
// The last node is reserved for notes generated with {@code getEndPath}.
const ExplodedNode *NextNode = ErrorNode->getFirstPred();
while (NextNode) {

Why can't we have a different visitation order implemented specifically for the refutation visitor? (@NoQ, @xazax.hun)

clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
194 ↗(On Diff #259532)

x and y -> x and n.

197 ↗(On Diff #259532)

Perhaps reportIfTrue(y > 4) would be easier to understand here. But then probably you also need a rename: ZERO_STATE_SHOULD_NOT_EXIST -> STATE_SHOULD_NOT_EXIST

martong added inline comments.Apr 23 2020, 7:52 AM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2816

Hmm, to answer my own question, then we would visit the ErrorNode twice. So then perhaps we should not allow any constraints in the ErrorNodes, right? What if we'd split all such ErrorNodes into a PrevNode with the constraints and a simple ErrorNode without the constraints?

steakhal updated this revision to Diff 259674.Apr 23 2020, 12:24 PM
steakhal marked 7 inline comments as done.Apr 23 2020, 12:30 PM

I'm really embarrassed that I uploaded the wrong diff.
I appologize.

Most of the comments are done :)

xazax.hun accepted this revision.Apr 23 2020, 2:12 PM

LGTM, thanks!

This revision is now accepted and ready to land.Apr 23 2020, 2:12 PM

@steakhal You might want to update the patch summary before committing this to the upstream (it still mentions "not needing a visitor" 😄)

martong added inline comments.Apr 24 2020, 7:49 AM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2814

If there is really no other way (see my comments below) then perhaps OverwriteConstraintsOnExistingSyms would be a better naming than OnlyForNewSymbols because it expresses what exactly is happening.
In either case, could you add some documentation (comments) to addConstraints? What it does, and what is the exact meaning of the param?

2816

@xazax.hun, and others: Any thoughts on my comments above? What do you think, would it be possible to split the error node into two nodes? A PrevNode with the constraints and a simple ErrorNode without the constraints?

clang/unittests/StaticAnalyzer/FalsePositiveRefutationBRVisitorTest.cpp
194 ↗(On Diff #259532)

It's not done. It is still x and y but should be x and n, isn't it?

xazax.hun added inline comments.Apr 24 2020, 8:12 AM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2816

I think this would make a lot of sense in the context of this visitor. But we would need to check if it also makes sense in the context of the other parts of the analyzer (other visitors and mechanisms like bug report generation). I'd prefer such a change to be a separate patch so we can assess it easier whether it makes things easier or more complicated. I think it might be worth to experiment with.

steakhal marked 2 inline comments as done.Apr 25 2020, 8:57 AM
steakhal updated this revision to Diff 260108.Apr 25 2020, 9:00 AM
steakhal retitled this revision from [analyzer][Z3-refutation] Fix refutation BugReporterVisitor bug and refactor to [analyzer][Z3-refutation] Fix a refutation BugReporterVisitor bug.
steakhal edited the summary of this revision. (Show Details)
  • fix patch summary
  • fix patch title
  • renamed function parameter to OverwriteConstraintsOnExistingSyms
  • fixed x+y to x+n in the test comment
martong accepted this revision.Apr 27 2020, 1:35 AM

LGTM! Thanks for addressing the comments!

I'm planning to measure how impactful this patch is using the CodeChecker.
Stay tuned if you are interested!

Nice work, @steakhal!

I remember discussing with @NoQ a way of going backward on the bug report, from the error node, and never add constraints of symbols that were already collected; that way we would always have the tighter constraints.

I don't quite remember if I ever implemented it that way though.

NoQ added inline comments.May 3 2020, 1:58 PM
clang/lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2816

We could add a new visitor callback, like VisitErrorNode or something, and call it before visiting all other nodes. It won't affect other visitors because they don't subscribe to it. Then drop the update in finalizeVisitor in this visitor. Visitor callbacks are currently messy; it's hard to understand what exactly they're doing by looking at their name. A cleanup is super welcome.

This revision was automatically updated to reflect the committed changes.