This is a prototype of a bug reporter visitor that invalidates bug reports by re-checking constraints of certain states on the bug path using the Z3 constraint manager backend. The functionality is available under the crosscheck-with-z3 analyzer config flag.
Details
Diff Detail
- Repository
- rC Clang
Event Timeline
- Moved FalsePositiveRefutationBRVisitor::Profile definition to BugReporterVisitor.cpp
- Update test cases two run twice, with and without the crosscheck
- Removed the FirstNodeVisited flag (the solver is being reset after checking the bug reachability)
- Use ranged loop when adding the constraints
lib/StaticAnalyzer/Core/BugReporter.cpp | ||
---|---|---|
3153 | Unless I'm mistaken, visitors are run in the order they are being declared. Probably LikelyFalsePositiveSuppressionBRVisitor should be even before that. | |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
2382 | (apologies in advance for nitpicking not on your code). Currently, this is written in a stateful way: we have a solver, at each iteration we add constraints, and at the end we reset it. To me it would make considerably more sense to write the code in a functional style: as we go, generate a vector of formulas, then once we reach the path end, create the solver object, check satisfiability, and then destroy the entire solver. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
923 ↗ | (On Diff #148969) | @mikhail.ramalho I know the first version was not yours, but could you write a doxygen comment explaining the semantics of all parameters? (I know we are guilty for not writing those often). I am also quite confused by the semantics of OnlyPurged variable. |
1249 ↗ | (On Diff #148969) | solver can also return "unknown", what happens then? |
1259 ↗ | (On Diff #148969) | TBH I'm really confused here. Why does the method take two constraint ranges? What's OnlyPurged? From reading the code it seems it's set by seeing whether the program point only purges dead symbols, but at least a comment should be added as to why this affects behavior. |
1264 ↗ | (On Diff #148969) | I would guess that this is an optimization done in order not to re-add the constraints we already have. |
1267 ↗ | (On Diff #148969) | almost certainly a bug, we shouldn't default to unfeasible when the list of constraints is empty. |
1278 ↗ | (On Diff #148969) | I'm really curious where does it happen and why. |
include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h | ||
---|---|---|
496 ↗ | (On Diff #148969) | See the comment below, I think we should not have this manager here. Just create one in the visitor constructor. |
563 ↗ | (On Diff #148969) | This should be deleted as well (see the comment above) |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
2382 | Elaborating more: we are already forced to have visitor object state, let's use that. RefutationMgr is essentially a wrapper around a Z3 solver object, let's just create one when visitor is constructed (directly or in unique_ptr) and then rely on the destructor to destroy it. | |
lib/StaticAnalyzer/Core/ProgramState.cpp | ||
83 ↗ | (On Diff #148969) | This could be removed as well (see the comment above) |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
919 ↗ | (On Diff #148969) | reset should be removed, see comments above. |
1246 ↗ | (On Diff #148969) | I would remove this, see comments above. |
1292 ↗ | (On Diff #148969) | I'm very confused as to why are we doing disjunctions here. |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
---|---|---|
1292 ↗ | (On Diff #148969) | I think this corresponds to RangeSet being a union of Ranges. |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
---|---|---|
1249 ↗ | (On Diff #148969) | If it returns Z3_L_UNDEF, e.g. in case of a timeout, this assumes that the state was feasible because we couldn't prove the opposite. In that case the report won't be invalidated. |
1259 ↗ | (On Diff #148969) | The logic was:
So OnlyPurged is meant to signal that we only want to add those symbols to the solver that are getting purged from the program state. |
1278 ↗ | (On Diff #148969) |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2382 | Note that while constructing the constraint solver here might make perfect sense now, it also inhibits incremental solving. If we do not plan to experiment with incremental solvers anytime soon I am fine with this direction as well. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1264 ↗ | (On Diff #148969) | Note that we are using lots of domain knowledge here like we have the most info about a symbol just before it dies. Also This optimization is a single lookup on the symbol level. I am not sure if Z3 could deal with this on the symbol level. It might need to do this on the constraint level. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2382 | @xazax.hun Right, I see. Now I don't think incremental solving would help, and I don't think that having a global solver object would be helpful for it. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1249 ↗ | (On Diff #148969) | @rnkovacs that's possibly valid (though the exact behavior might need to be behind an option), but the current implementation is wrong and the decision should be made at a different stack level. |
1259 ↗ | (On Diff #148969) | @rnkovacs right, so that's an optimization not to add extra constraints?
|
1264 ↗ | (On Diff #148969) |
What do you mean? I'm positive adding redundant constraints to Z3 would not slow solving down.
"Premature optimization is a root of (most) evil". |
1278 ↗ | (On Diff #148969) | @rnkovacs right, so this is a workaround for an existing bug? In that case a FIXME with a link to the revision you have mentioned should be added. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2382 | Just a bit of context and to have some expectation management regarding this patch. The main purpose of this implementation was to back a thesis. It was made under a very serious time pressure and the main goal was to be able to measure on real world projects as soon as possible and in the meantime to be flexible so we can measure multiple configurations (like incremental solving). So the goal was a flexible proof of concept that is sensible to measure in the shortest possible time. After the thesis was done, Reka started to work an another GSoC project, so she had no time to review the code with the requirements of upstreaming in mind. Nevertheless we found that sharing the proof of concept could be useful for the community. So it is perfectly reasonable if you disagree with some design decisions behind this patch, because the requirements for the thesis (in the short time frame) was very different from the requirements of upstreaming this work. In a different context these decisions made perfect sense. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1259 ↗ | (On Diff #148969) | We would not just add the very same constraints over and over again. We would first add the strongest possible constraint that the analyzer could infer first and later on add weaker and weaker ones. Does Z3 do some special handling for that as well? |
1264 ↗ | (On Diff #148969) | While I do agree that we should not optimize prematurely this is not just an optimization. Having a minimal set of constraints is also useful for debugging when dumping the set of constraints that Z3 tries to solve. Also, I think this optimization is quite simple, so I do not see removing it making the code much simpler. This is the reason why I would live to see some performance numbers first. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2382 | @xazax.hun of course. My comments are for @mikhail.ramalho who is now working on this patch. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1259 ↗ | (On Diff #148969) | @xazax.hun Since all subformulas are hash-consed, I would be extremely surprised if this heuristic affected performance in a large way. However, your approach could be still useful in order to get readable dumps from the solver. // a comment explaining the logic if (State.succ_size() == 0 || State.getLocation().isPurged()) solver.addConstraints(state) |
1264 ↗ | (On Diff #148969) | OK your second point is valid, I think I've replied to it in my comment above. |
@xazax.hun (I'll reply here to avoid scattering the conversation across many subtrees)
I was thinking about the optimization for not adding redundant constraints some more, and I've decided I'm still against it ---
we are creating a higher potential for bugs, and we are tightly coupling the visitor to an internal implementation detail (all formulas are eventually purged at purge locations),
which creates a more fragile code.
The proper way to do this would be to have a set of constraints, and then add all constraints there as we iterate through the states (and through constraints inside the state).
If we use the hashing function provided by Z3, the simple act of construction of a set would implicitly drop all redundant constraints.
@mikhail.ramalho In any case, the discussion here just further highlights that this optimization should be dropped from the initial patch, and if anything applied in a subsequent revision.
I am not not sure that I got the idea what are you suggesting here. If we have the constraint of for example a symbol s > 10 and later on a path we discover s > 20, will we also deduplicate this that way?
(Since the visitor is running backward we will add s > 20 constraint first, but this should be irrelevant for the deduplication I guess.)
@mikhail.ramalho In any case, the discussion here just further highlights that this optimization should be dropped from the initial patch, and if anything applied in a subsequent revision.
Seams reasonable.
I am not not sure that I got the idea what are you suggesting here. If we have the constraint of for example a symbol s > 10 and later on a path we discover s > 20, will we also deduplicate this that way?
No. But I thought in your optimization atoms inside the constraints would be the same?
Could you give an example where they are not?
So the logic in the current patch would be the following. When the symbol s dies it will be cleaned up from the state. For each symbol we will find the state where it was cleaned up. We will add the constraint for that symbol from the state before the cleanup which will contain the constraint s > 20. This is the only point where we add the constraints regarding the symbol s, so s > 10 later on while we are traversing the path backwards will not be added.
Code to trigger this behavior:
void f(int s) { if (s > 10) { // ... if (s > 20) { // trigger a warning } } }
So the point of this optimization is to only add the ranges of a symbol once, where we have the most information about it. So strictly speaking it is not a deduplication on the constraint level but on the symbol level.
So strictly speaking it is not a deduplication on the constraint level but on the symbol level.
Right, apologies, I was initially mistaken then.
That's not even deduplication, I would call it using the interval solver to guide the constraint selection for the SMT solver.
That makes sense, but I'm worried about tight coupling between different features, and classes of bugs which may arise due to that.
It would be great to have it in a separate patch then.
- Simplified refutation process: it now collects all the constraints in a given path and, only when it reaches the root node, the refutation manager is created and the constraints are checked for reachability. All the optimizations were removed.
- Moved RangedConstraintManager.h to include/
- Moved refutation check to be the first in the list of BugVisitors
- Added dump method to Z3Solver (to print the formula)
- Added more documentation/comments
Hi,
Just a bit of context and to have some expectation management regarding
this patch. The main purpose of this implementation was to back a thesis.
It was made under a very serious time pressure and the main goal was to be
able to measure on real world projects as soon as possible and in the
meantime to be flexible so we can measure multiple configurations (like
incremental solving).So the goal was a flexible proof of concept that is sensible to measure in
the shortest possible time. After the thesis was done, Reka started to work
an another GSoC project, so she had no time to review the code with the
requirements of upstreaming in mind. Nevertheless we found that sharing the
proof of concept could be useful for the community. So it is perfectly
reasonable if you disagree with some design decisions behind this patch,
because the requirements for the thesis (in the short time frame) was very
different from the requirements of upstreaming this work. In a different
context these decisions made perfect sense.
Just want to comment here and give thanks again for the first version of
the refutation code. It's being really helpful to develop the approach this
code as a base; things would definitely be slower if I had to start it from
scratch.
Thanks!
@mikhail.ramalho Thanks for this note, it's very nice of you :)
I'm glad if it saves a bit of time, but it's only a rough sketch, so please feel free to tailor it to your liking (and the reviewers' of course).
Thanks, this is going in the right direction!
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h | ||
---|---|---|
182 ↗ | (On Diff #149317) | We don't need reset anymore. |
183 ↗ | (On Diff #149317) | Making those virtual does not make much sense to me. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
2378 |
| |
2392 | That would be checking all constraints in all nodes one by one. I thought the idea was to encode all constraints from the entire path and then check all of it. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
889 ↗ | (On Diff #149317) | 👍 |
925 ↗ | (On Diff #149317) | We don't need reset anymore. |
928 ↗ | (On Diff #149317) | The semantics of this method is incorrect. It should return a tri-value somehow (e.g. Optional<bool>, and then higher-level logic in visitor should decide what to do with it.) |
1272 ↗ | (On Diff #149317) | Since https://reviews.llvm.org/D47603 has landed we should drop this branch. |
1282 ↗ | (On Diff #149317) | /*RetTy=*/nullptr |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
---|---|---|
928 ↗ | (On Diff #149317) | We could also use ConditionTruthVal for this purpose. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2392 | All the constraints are being added in the previous for loop, isModelFeasible only calls check(). | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
925 ↗ | (On Diff #149317) | We don't need it but there's no reason to remove it, right? I might be useful in the future. |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
---|---|---|
925 ↗ | (On Diff #149317) | We try to keep the code as small and as simple as possible so that it still achieves the task -- under that logic, unused methods should not be added. |
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h | ||
---|---|---|
183 ↗ | (On Diff #149317) | Z3ConstraintManager is fully contained inside a .cpp file, so we need isModelFeasible and addRangeConstraints to be exposed via its base class. Another solution is to split Z3ConstraintManager into a .h and a .cpp file and include the header. We would then be able to use it directly, instead of through a ConstraintManager object. I honestly prefer the latter. What do you think? |
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h | ||
---|---|---|
183 ↗ | (On Diff #149317) | Yeah, I think we would need a header here. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
2392 | Ah right, I see we are inside of the branch when pred_size() == 0. |
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h | ||
---|---|---|
183 ↗ | (On Diff #149317) | Cool, I'll create a separate patch for that then. |
@mikhail.ramalho I assume you know it, but just in case, you can mark dependencies in phabricator by adding "parent" revisions.
- Simplified the API even further by constructing a Z3ConstraintManager object directly.
- Update isModelFeasible to return a isModelFeasible
- Update code with the fix for 1-bit long integer
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2366 | I'm not happy about this cast. Suggestions are welcome. |
I updated the test case as the cross check is not marking the true bug as invalid anymore.
Awesome! Does it mean that the optimization for adding less constraints was in fact buggy?
My make clang-test is still failing Driver/response-file.c whenever I compile clang with Z3. I'll update the patch as soon as I find the reason why.
- You shouldn't use make, use ninja (also make sure you use gold linker, default linker takes forever on Linux)
- Could it be something unrelated to your changes? Any given trunk version can be buggy, but usually those are resolved very fast, so if you update now the issue can go away.
Watching cfe-commits mailing list might be helpful there.
Otherwise looks good apart from minor naming nit! I guess we could figure out the casting issue later.
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2360 | we would need a more descriptive name, e.g. isUnfeasible or similar. | |
2366 | well yeah, CreateZ3ConstraintManager should return an SMTConstraintManager. |
Awesome! Does it mean that the optimization for adding less constraints
was in fact buggy?
I pretty sure it was not related to the optimizations, I removed them days
ago (in the previous version of this patch) and the bug was still there.
- Could it be something unrelated to your changes? Any given trunk
version can be buggy, but usually those are resolved very fast, so if you
update now the issue can go away.Watching cfe-commits mailing list might be helpful there.
I update my repo every other day and it's been happening for the past
two/three weeks :/
The compiler shows the following error:
posix_spawn failed: Argument list too long
There are some discussions in several places about it.
I pretty sure it was not related to the optimizations, I removed them days
ago (in the previous version of this patch) and the bug was still there.
OK so any idea what the change could have been? Clearly the bug was there but not now. Anyway, should be OK to commit now.
I update my repo every other day and it's been happening for the past
two/three weeks :/
If it happens with your patch reverted as well, then it's unrelated, and we should just commit.