Page MenuHomePhabricator

[analyzer] False positive refutation with Z3
ClosedPublic

Authored by mikhail.ramalho on Apr 11 2018, 7:15 AM.

Details

Summary

This is a prototype of a bug reporter visitor that invalidates bug reports by re-checking constraints of certain states on the bug path using the Z3 constraint manager backend. The functionality is available under the crosscheck-with-z3 analyzer config flag.

Diff Detail

Repository
rC Clang

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
mikhail.ramalho marked 6 inline comments as done.May 29 2018, 1:07 PM
  1. Moved FalsePositiveRefutationBRVisitor::Profile definition to BugReporterVisitor.cpp
  2. Update test cases two run twice, with and without the crosscheck
  3. Removed the FirstNodeVisited flag (the solver is being reset after checking the bug reachability)
  4. Use ranged loop when adding the constraints
lib/StaticAnalyzer/Core/BugReporter.cpp
3153

Unless I'm mistaken, visitors are run in the order they are being declared.
It seems to me we would want to register our visitor first, as it does not make sense to run diagnostics-visitors if we have already deemed the path to be unfeasible.

Probably LikelyFalsePositiveSuppressionBRVisitor should be even before that.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2382

(apologies in advance for nitpicking not on your code).

Currently, this is written in a stateful way: we have a solver, at each iteration we add constraints, and at the end we reset it. To me it would make considerably more sense to write the code in a functional style: as we go, generate a vector of formulas, then once we reach the path end, create the solver object, check satisfiability, and then destroy the entire solver.

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
923 ↗(On Diff #148969)

@mikhail.ramalho I know the first version was not yours, but could you write a doxygen comment explaining the semantics of all parameters? (I know we are guilty for not writing those often).

I am also quite confused by the semantics of OnlyPurged variable.

1249 ↗(On Diff #148969)

solver can also return "unknown", what happens then?

1259 ↗(On Diff #148969)

TBH I'm really confused here. Why does the method take two constraint ranges? What's OnlyPurged? From reading the code it seems it's set by seeing whether the program point only purges dead symbols, but at least a comment should be added as to why this affects behavior.

1264 ↗(On Diff #148969)

I would guess that this is an optimization done in order not to re-add the constraints we already have.
I think we should really not bother doing that, as Z3 will do a much better job here then we can.

1267 ↗(On Diff #148969)

almost certainly a bug, we shouldn't default to unfeasible when the list of constraints is empty.

1278 ↗(On Diff #148969)

I'm really curious where does it happen and why.

george.karpenkov requested changes to this revision.May 29 2018, 4:14 PM
This revision now requires changes to proceed.May 29 2018, 4:14 PM
include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h
496 ↗(On Diff #148969)

See the comment below, I think we should not have this manager here. Just create one in the visitor constructor.

563 ↗(On Diff #148969)

This should be deleted as well (see the comment above)

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2382

Elaborating more: we are already forced to have visitor object state, let's use that. RefutationMgr is essentially a wrapper around a Z3 solver object, let's just create one when visitor is constructed (directly or in unique_ptr) and then rely on the destructor to destroy it.
Then no reset is necessary.

lib/StaticAnalyzer/Core/ProgramState.cpp
83 ↗(On Diff #148969)

This could be removed as well (see the comment above)

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
919 ↗(On Diff #148969)

reset should be removed, see comments above.

1246 ↗(On Diff #148969)

I would remove this, see comments above.

1292 ↗(On Diff #148969)

I'm very confused as to why are we doing disjunctions here.

NoQ added inline comments.May 29 2018, 5:26 PM
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
1292 ↗(On Diff #148969)

I think this corresponds to RangeSet being a union of Ranges.

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
1267 ↗(On Diff #148969)

Ooops, sorry, now I see how the code is supposed to work.

1292 ↗(On Diff #148969)

Ah, thanks, right! Then my previous comment regarding false is wrong.

rnkovacs added inline comments.May 29 2018, 10:57 PM
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
1249 ↗(On Diff #148969)

If it returns Z3_L_UNDEF, e.g. in case of a timeout, this assumes that the state was feasible because we couldn't prove the opposite. In that case the report won't be invalidated.

1259 ↗(On Diff #148969)

The logic was:

  • add every constraint from the last node (first visited),
  • for other nodes on the path, only add those that disappear in the next step.

So OnlyPurged is meant to signal that we only want to add those symbols to the solver that are getting purged from the program state.

1278 ↗(On Diff #148969)

I encountered some 1-bit APSInts that wouldn't work together with any other integer-handling logic. As @ddcc mentioned, he has a fix for that in D35450 (Z3ConstraintManager::fixAPSInt()).

xazax.hun added inline comments.May 30 2018, 12:08 AM
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2382

Note that while constructing the constraint solver here might make perfect sense now, it also inhibits incremental solving. If we do not plan to experiment with incremental solvers anytime soon I am fine with this direction as well.

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
1264 ↗(On Diff #148969)

Note that we are using lots of domain knowledge here like we have the most info about a symbol just before it dies. Also This optimization is a single lookup on the symbol level. I am not sure if Z3 could deal with this on the symbol level. It might need to do this on the constraint level.
My point is, I am perfectly fine removing this optimization but I would like to see some performance numbers first either on a project that exercises refutation quite a bit or on some synthetic test cases.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2382

@xazax.hun Right, I see.
However, we should not optimize prematurely --- IF we decide to have incremental solving, then we would change our design to support it.

Now I don't think incremental solving would help, and I don't think that having a global solver object would be helpful for it.

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
1249 ↗(On Diff #148969)

@rnkovacs that's possibly valid (though the exact behavior might need to be behind an option), but the current implementation is wrong and the decision should be made at a different stack level.
This method is responsible for "whether the model is valid", and it should not say "yes" when the solver returns "unknown". We could return an Optional<bool> here, or a tri-value logic type (IIRC LLVM had one, which could represent true/false/unknown)

1259 ↗(On Diff #148969)

@rnkovacs right, so that's an optimization not to add extra constraints?

  1. I would not do it at all, all formulas in Z3 are hash-consed, so a new formula would not even be *constructed* if it's already asserted in the solver.
  1. Even if we do do it (which we shouldn't), this logic does not belong to Z3ConstraintManager. The method should have simple semantics: take a state, add all constraints to solver.
1264 ↗(On Diff #148969)

This optimization is a single lookup on the symbol level. I am not sure if Z3 could deal with this on the symbol level

What do you mean? I'm positive adding redundant constraints to Z3 would not slow solving down.

I would like to see some performance numbers first either on a project that exercises refutation quite a bit or on some synthetic test cases.

"Premature optimization is a root of (most) evil".
It should totally be the other way around -- a simple correct solution should be implemented first, and then a benchmark could be used to justify adding an optimization.

1278 ↗(On Diff #148969)

@rnkovacs right, so this is a workaround for an existing bug? In that case a FIXME with a link to the revision you have mentioned should be added.
Ideally, a test with a FIXME should be added as well, but I understand if that's too complicated.

xazax.hun added inline comments.May 30 2018, 11:50 AM
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2382

Just a bit of context and to have some expectation management regarding this patch. The main purpose of this implementation was to back a thesis. It was made under a very serious time pressure and the main goal was to be able to measure on real world projects as soon as possible and in the meantime to be flexible so we can measure multiple configurations (like incremental solving).

So the goal was a flexible proof of concept that is sensible to measure in the shortest possible time. After the thesis was done, Reka started to work an another GSoC project, so she had no time to review the code with the requirements of upstreaming in mind. Nevertheless we found that sharing the proof of concept could be useful for the community. So it is perfectly reasonable if you disagree with some design decisions behind this patch, because the requirements for the thesis (in the short time frame) was very different from the requirements of upstreaming this work. In a different context these decisions made perfect sense.

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
1259 ↗(On Diff #148969)

We would not just add the very same constraints over and over again. We would first add the strongest possible constraint that the analyzer could infer first and later on add weaker and weaker ones. Does Z3 do some special handling for that as well?

1264 ↗(On Diff #148969)

While I do agree that we should not optimize prematurely this is not just an optimization. Having a minimal set of constraints is also useful for debugging when dumping the set of constraints that Z3 tries to solve.

Also, I think this optimization is quite simple, so I do not see removing it making the code much simpler. This is the reason why I would live to see some performance numbers first.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2382

@xazax.hun of course. My comments are for @mikhail.ramalho who is now working on this patch.

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
1259 ↗(On Diff #148969)

@xazax.hun Since all subformulas are hash-consed, I would be extremely surprised if this heuristic affected performance in a large way.

However, your approach could be still useful in order to get readable dumps from the solver.
I would still argue it should be done in a visitor, so something along the lines of:

// a comment explaining the logic
if (State.succ_size() == 0 || State.getLocation().isPurged())
   solver.addConstraints(state)
1264 ↗(On Diff #148969)

OK your second point is valid, I think I've replied to it in my comment above.

@xazax.hun (I'll reply here to avoid scattering the conversation across many subtrees)

I was thinking about the optimization for not adding redundant constraints some more, and I've decided I'm still against it ---
we are creating a higher potential for bugs, and we are tightly coupling the visitor to an internal implementation detail (all formulas are eventually purged at purge locations),
which creates a more fragile code.

The proper way to do this would be to have a set of constraints, and then add all constraints there as we iterate through the states (and through constraints inside the state).
If we use the hashing function provided by Z3, the simple act of construction of a set would implicitly drop all redundant constraints.

@mikhail.ramalho In any case, the discussion here just further highlights that this optimization should be dropped from the initial patch, and if anything applied in a subsequent revision.

@xazax.hun (I'll reply here to avoid scattering the conversation across many subtrees)

I was thinking about the optimization for not adding redundant constraints some more, and I've decided I'm still against it ---
we are creating a higher potential for bugs, and we are tightly coupling the visitor to an internal implementation detail (all formulas are eventually purged at purge locations),
which creates a more fragile code.

The proper way to do this would be to have a set of constraints, and then add all constraints there as we iterate through the states (and through constraints inside the state).
If we use the hashing function provided by Z3, the simple act of construction of a set would implicitly drop all redundant constraints.

I am not not sure that I got the idea what are you suggesting here. If we have the constraint of for example a symbol s > 10 and later on a path we discover s > 20, will we also deduplicate this that way?
(Since the visitor is running backward we will add s > 20 constraint first, but this should be irrelevant for the deduplication I guess.)

@mikhail.ramalho In any case, the discussion here just further highlights that this optimization should be dropped from the initial patch, and if anything applied in a subsequent revision.

Seams reasonable.

I am not not sure that I got the idea what are you suggesting here. If we have the constraint of for example a symbol s > 10 and later on a path we discover s > 20, will we also deduplicate this that way?

No. But I thought in your optimization atoms inside the constraints would be the same?
Could you give an example where they are not?

I am not not sure that I got the idea what are you suggesting here. If we have the constraint of for example a symbol s > 10 and later on a path we discover s > 20, will we also deduplicate this that way?

No. But I thought in your optimization atoms inside the constraints would be the same?
Could you give an example where they are not?

So the logic in the current patch would be the following. When the symbol s dies it will be cleaned up from the state. For each symbol we will find the state where it was cleaned up. We will add the constraint for that symbol from the state before the cleanup which will contain the constraint s > 20. This is the only point where we add the constraints regarding the symbol s, so s > 10 later on while we are traversing the path backwards will not be added.

Code to trigger this behavior:

void f(int s) {
  if (s > 10) {
    // ...
    if (s > 20) {
        // trigger a warning
    }
  }
}

So the point of this optimization is to only add the ranges of a symbol once, where we have the most information about it. So strictly speaking it is not a deduplication on the constraint level but on the symbol level.

@xazax.hun

So strictly speaking it is not a deduplication on the constraint level but on the symbol level.

Right, apologies, I was initially mistaken then.
That's not even deduplication, I would call it using the interval solver to guide the constraint selection for the SMT solver.

That makes sense, but I'm worried about tight coupling between different features, and classes of bugs which may arise due to that.
It would be great to have it in a separate patch then.

mikhail.ramalho marked 6 inline comments as not done.
  • Simplified refutation process: it now collects all the constraints in a given path and, only when it reaches the root node, the refutation manager is created and the constraints are checked for reachability. All the optimizations were removed.
  • Moved RangedConstraintManager.h to include/
  • Moved refutation check to be the first in the list of BugVisitors
  • Added dump method to Z3Solver (to print the formula)
  • Added more documentation/comments

Hi,

Just a bit of context and to have some expectation management regarding
this patch. The main purpose of this implementation was to back a thesis.
It was made under a very serious time pressure and the main goal was to be
able to measure on real world projects as soon as possible and in the
meantime to be flexible so we can measure multiple configurations (like
incremental solving).

So the goal was a flexible proof of concept that is sensible to measure in
the shortest possible time. After the thesis was done, Reka started to work
an another GSoC project, so she had no time to review the code with the
requirements of upstreaming in mind. Nevertheless we found that sharing the
proof of concept could be useful for the community. So it is perfectly
reasonable if you disagree with some design decisions behind this patch,
because the requirements for the thesis (in the short time frame) was very
different from the requirements of upstreaming this work. In a different
context these decisions made perfect sense.

Just want to comment here and give thanks again for the first version of
the refutation code. It's being really helpful to develop the approach this
code as a base; things would definitely be slower if I had to start it from
scratch.

Thanks!

Just want to comment here and give thanks again for the first version of
the refutation code. It's being really helpful to develop the approach this
code as a base; things would definitely be slower if I had to start it from
scratch.

@mikhail.ramalho Thanks for this note, it's very nice of you :)
I'm glad if it saves a bit of time, but it's only a rough sketch, so please feel free to tailor it to your liking (and the reviewers' of course).

george.karpenkov requested changes to this revision.May 31 2018, 3:34 PM

Thanks, this is going in the right direction!

include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
182 ↗(On Diff #149317)

We don't need reset anymore.

183 ↗(On Diff #149317)

Making those virtual does not make much sense to me.
Returning true by default is not correct.
When we are using the visitor, we should already know we have a Z3ConstraintsManager, why can't we just use methods of that class?

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2378
  1. RefutationMgr should be created in the visitor constructor.
  2. At this point we should not check options; if the visitor is created, we are assuming that the option is on.
  3. Consequently, the subsequent assert should be dropped.
2392

That would be checking all constraints in all nodes one by one. I thought the idea was to encode all constraints from the entire path and then check all of it.

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
889 ↗(On Diff #149317)

👍

925 ↗(On Diff #149317)

We don't need reset anymore.

928 ↗(On Diff #149317)

The semantics of this method is incorrect. It should return a tri-value somehow (e.g. Optional<bool>, and then higher-level logic in visitor should decide what to do with it.)

1272 ↗(On Diff #149317)

Since https://reviews.llvm.org/D47603 has landed we should drop this branch.

1282 ↗(On Diff #149317)

/*RetTy=*/nullptr

This revision now requires changes to proceed.May 31 2018, 3:34 PM
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
928 ↗(On Diff #149317)

We could also use ConditionTruthVal for this purpose.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2392

All the constraints are being added in the previous for loop, isModelFeasible only calls check().

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
925 ↗(On Diff #149317)

We don't need it but there's no reason to remove it, right? I might be useful in the future.

lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp
925 ↗(On Diff #149317)

We try to keep the code as small and as simple as possible so that it still achieves the task -- under that logic, unused methods should not be added.
I dislike reset in particular as it encourages stateful approach where the same instance is used for all queries, which increases the likelihood of bugs.

include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
183 ↗(On Diff #149317)

Z3ConstraintManager is fully contained inside a .cpp file, so we need isModelFeasible and addRangeConstraints to be exposed via its base class.

Another solution is to split Z3ConstraintManager into a .h and a .cpp file and include the header. We would then be able to use it directly, instead of through a ConstraintManager object.

I honestly prefer the latter. What do you think?

include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
183 ↗(On Diff #149317)

Yeah, I think we would need a header here.
In general we try to avoid inheritance and virtual functions unless they are very beneficial, and here they are not.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2392

Ah right, I see we are inside of the branch when pred_size() == 0.
Sorry, I was wrong -- but could we move out this code to a private function (could also simply use static function to avoid polluting the header)?

include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h
183 ↗(On Diff #149317)

Cool, I'll create a separate patch for that then.

@mikhail.ramalho I assume you know it, but just in case, you can mark dependencies in phabricator by adding "parent" revisions.

  • Simplified the API even further by constructing a Z3ConstraintManager object directly.
  • Update isModelFeasible to return a isModelFeasible
  • Update code with the fix for 1-bit long integer

Update patch based on D47640 and D47689.

I updated the test case as the cross check is not marking the true bug as invalid anymore.

My make clang-test is still failing Driver/response-file.c whenever I compile clang with Z3. I'll update the patch as soon as I find the reason why.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2366

I'm not happy about this cast. Suggestions are welcome.

george.karpenkov requested changes to this revision.Jun 3 2018, 1:50 PM

I updated the test case as the cross check is not marking the true bug as invalid anymore.

Awesome! Does it mean that the optimization for adding less constraints was in fact buggy?

My make clang-test is still failing Driver/response-file.c whenever I compile clang with Z3. I'll update the patch as soon as I find the reason why.

  1. You shouldn't use make, use ninja (also make sure you use gold linker, default linker takes forever on Linux)
  2. Could it be something unrelated to your changes? Any given trunk version can be buggy, but usually those are resolved very fast, so if you update now the issue can go away.

Watching cfe-commits mailing list might be helpful there.

Otherwise looks good apart from minor naming nit! I guess we could figure out the casting issue later.

lib/StaticAnalyzer/Core/BugReporterVisitors.cpp
2360

we would need a more descriptive name, e.g. isUnfeasible or similar.
from bool check_constraints it's unclear when false is returned.

2366

well yeah, CreateZ3ConstraintManager should return an SMTConstraintManager.
I don't fully understand the problem there, I'll try to take a look.

This revision now requires changes to proceed.Jun 3 2018, 1:50 PM

Awesome! Does it mean that the optimization for adding less constraints
was in fact buggy?

I pretty sure it was not related to the optimizations, I removed them days
ago (in the previous version of this patch) and the bug was still there.

  1. Could it be something unrelated to your changes? Any given trunk version can be buggy, but usually those are resolved very fast, so if you update now the issue can go away.

    Watching cfe-commits mailing list might be helpful there.

I update my repo every other day and it's been happening for the past
two/three weeks :/

The compiler shows the following error:

posix_spawn failed: Argument list too long

There are some discussions in several places about it.

I pretty sure it was not related to the optimizations, I removed them days

ago (in the previous version of this patch) and the bug was still there.

OK so any idea what the change could have been? Clearly the bug was there but not now. Anyway, should be OK to commit now.

I update my repo every other day and it's been happening for the past

two/three weeks :/

If it happens with your patch reverted as well, then it's unrelated, and we should just commit.

mikhail.ramalho edited the summary of this revision. (Show Details)

Fix naming issue.

george.karpenkov accepted this revision.Jun 3 2018, 9:36 PM
This revision is now accepted and ready to land.Jun 3 2018, 9:36 PM
Closed by commit rC333903: [analyzer] False positive refutation with Z3 (authored by mramalho, committed by ). · Explain WhyJun 4 2018, 7:44 AM
This revision was automatically updated to reflect the committed changes.