This is a prototype of a bug reporter visitor that invalidates bug reports by re-checking constraints of certain states on the bug path using the Z3 constraint manager backend. The functionality is available under the crosscheck-with-z3 analyzer config flag.
Details
Diff Detail
- Repository
- rC Clang
Event Timeline
Fixed logical operator in the Z3ConstraintManager::checkRangedStateConstraints() function.
The visitor currently checks states appearing as block edges in the exploded graph. The first idea was to filter states based on the shape of the exploded graph, by checking the number of successors of the parent node, but surprisingly, both succ_size() and pred_size() seemed to return 1 for each node in the graph (except for the root), even if there clearly were branchings in the code (and on the .dot picture). To my understanding, the exploded graph is fully constructed at the stage where visitors are run, so I must be missing something.
Aha, yep, that's probably because visitors are operating on the "trimmed" exploded graph. You can paint it via the -trim-egraph flag or by calling ViewGraph(1) in the debugger.
So, yeah, that's a good optimization that we're not invoking the solver on every node. But i don't think we should focus on improving this optimization further; instead, i think the next obvious step here is to implement it in such a way that we only needed to call the solver once for every report. We could simply collect all constraints from all states along the path and put them into the solver all together. This will work because symbols are not mutable and they don't reincarnate.
Apart from that, the patch seems to be going in the right direction. It should be possible to split up the RangeSet refactoring into a different review, for easier reviewing and better commit history.
Oh, thanks! That explains a lot.
So, yeah, that's a good optimization that we're not invoking the solver on every node. But i don't think we should focus on improving this optimization further; instead, i think the next obvious step here is to implement it in such a way that we only needed to call the solver once for every report. We could simply collect all constraints from all states along the path and put them into the solver all together. This will work because symbols are not mutable and they don't reincarnate.
Won't collecting all constraints and solving a ~100ish equations at once take a long time? Maybe the timeout limit for Z3 will need to be slightly increased for refutation then.
Apart from that, the patch seems to be going in the right direction. It should be possible to split up the RangeSet refactoring into a different review, for easier reviewing and better commit history.
Done in D45920.
I'll update this patch shortly.
include/clang/StaticAnalyzer/Core/AnalyzerOptions.h | ||
---|---|---|
284 | The option name should be more self-explanatory, post-processing in general can mean anything | |
586 | Same here | |
include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h | ||
349 | LLVM coding standart mandates capital case for field names. | |
lib/StaticAnalyzer/Core/AnalyzerOptions.cpp | ||
301 | Same for the option name. "crosscheck-with-z3"? | |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
2343 | Is this field actually necessary? Do we ever check the same bug report with the same visitor multiple times? | |
2352 | For the initial version I would just do all work in the visitor, but that's a matter of taste. | |
lib/StaticAnalyzer/Core/ProgramState.cpp | ||
86 ↗ | (On Diff #143440) | Would then we crash on NPE if getRefutationManager is called? Getters should preferably not cause crashes. |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1268 ↗ | (On Diff #143440) | I wouldn't even bother with this branch, but again, a matter of taste. |
1273 ↗ | (On Diff #143440) | Why OR? Shouldn't it be AND? |
Well, in the worst case we would still be able to split our full system of equations into smaller chunks, and it'd most likely still be better than solving roughly-the-same system of equations ~100ish times.
Expression chaining is fixed. The visitor now collects constraints that are about to disappear along the bug path and checks them once in the end.
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2343 | I believe this function is called for each node on the bug path. I have a similar field to indicate the first visited node in the new version, but there may exist a better solution for that as well. | |
2352 | I think that doing all the work in the visitor would need exposing even more of Z3ConstraintManager's internals as of RangedConstraintManager. I tried to keep such changes minimal. | |
lib/StaticAnalyzer/Core/ProgramState.cpp | ||
86 ↗ | (On Diff #143440) | Um, currently yes, it will give a backend error if clang isn't built with Z3, but the option is on. |
FYI the fix for the 1-bit APSInt issue is in https://reviews.llvm.org/D35450#change-ifYnQ3IlVso
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
---|---|---|
1261 ↗ | (On Diff #145762) | for (auto I : CR)? |
Added test cases and updated the analyzer-config tests with the new crosscheck flag.
Currently, there is one test failing that does not fail when building without the crosscheck:
llvm/tools/clang/test/Driver/response-file.c:18:10: error: expected string not found in input // LONG: extern int it_works; ^ <stdin>:1:1: note: scanning from here clang version 7.0.0 (trunk 333352) (llvm/trunk 333374) ^ <stdin>:8:3: note: possible intended match here Selected GCC installation: /usr/lib/gcc/x86_64-redhat-linux/6.4.1 ^
Please resubmit with -U999 diff flag (or using arc)
include/clang/StaticAnalyzer/Core/BugReporter/BugReporterVisitors.h | ||
---|---|---|
362 | Can we have the whole class inside the .cpp file? It's annoying to recompile half of the analyzer when an internal implementation detail changes | |
365 | I'm really not convinced we need this boolean field | |
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h | ||
21 ↗ | (On Diff #148828) | NB: diff should be resubmitted with -U999, as phabricator shows "context not available" |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1261 ↗ | (On Diff #145762) | @mikhail.ramalho yes please do fix this one |
test/Analysis/z3-crosscheck.c | ||
---|---|---|
3 | Could we also have a second RUN line without Z3, and then use ifdef's to differentiate between the two in tests? |
- Moved FalsePositiveRefutationBRVisitor::Profile definition to BugReporterVisitor.cpp
- Update test cases two run twice, with and without the crosscheck
- Removed the FirstNodeVisited flag (the solver is being reset after checking the bug reachability)
- Use ranged loop when adding the constraints
lib/StaticAnalyzer/Core/BugReporter.cpp | ||
---|---|---|
3153 | Unless I'm mistaken, visitors are run in the order they are being declared. Probably LikelyFalsePositiveSuppressionBRVisitor should be even before that. | |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
2382 | (apologies in advance for nitpicking not on your code). Currently, this is written in a stateful way: we have a solver, at each iteration we add constraints, and at the end we reset it. To me it would make considerably more sense to write the code in a functional style: as we go, generate a vector of formulas, then once we reach the path end, create the solver object, check satisfiability, and then destroy the entire solver. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
923 ↗ | (On Diff #148969) | @mikhail.ramalho I know the first version was not yours, but could you write a doxygen comment explaining the semantics of all parameters? (I know we are guilty for not writing those often). I am also quite confused by the semantics of OnlyPurged variable. |
1249 ↗ | (On Diff #148969) | solver can also return "unknown", what happens then? |
1259 ↗ | (On Diff #148969) | TBH I'm really confused here. Why does the method take two constraint ranges? What's OnlyPurged? From reading the code it seems it's set by seeing whether the program point only purges dead symbols, but at least a comment should be added as to why this affects behavior. |
1264 ↗ | (On Diff #148969) | I would guess that this is an optimization done in order not to re-add the constraints we already have. |
1267 ↗ | (On Diff #148969) | almost certainly a bug, we shouldn't default to unfeasible when the list of constraints is empty. |
1278 ↗ | (On Diff #148969) | I'm really curious where does it happen and why. |
include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h | ||
---|---|---|
496 ↗ | (On Diff #148969) | See the comment below, I think we should not have this manager here. Just create one in the visitor constructor. |
563 ↗ | (On Diff #148969) | This should be deleted as well (see the comment above) |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
2382 | Elaborating more: we are already forced to have visitor object state, let's use that. RefutationMgr is essentially a wrapper around a Z3 solver object, let's just create one when visitor is constructed (directly or in unique_ptr) and then rely on the destructor to destroy it. | |
lib/StaticAnalyzer/Core/ProgramState.cpp | ||
83 ↗ | (On Diff #148969) | This could be removed as well (see the comment above) |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
919 ↗ | (On Diff #148969) | reset should be removed, see comments above. |
1246 ↗ | (On Diff #148969) | I would remove this, see comments above. |
1292 ↗ | (On Diff #148969) | I'm very confused as to why are we doing disjunctions here. |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
---|---|---|
1292 ↗ | (On Diff #148969) | I think this corresponds to RangeSet being a union of Ranges. |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
---|---|---|
1249 ↗ | (On Diff #148969) | If it returns Z3_L_UNDEF, e.g. in case of a timeout, this assumes that the state was feasible because we couldn't prove the opposite. In that case the report won't be invalidated. |
1259 ↗ | (On Diff #148969) | The logic was:
So OnlyPurged is meant to signal that we only want to add those symbols to the solver that are getting purged from the program state. |
1278 ↗ | (On Diff #148969) |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2382 | Note that while constructing the constraint solver here might make perfect sense now, it also inhibits incremental solving. If we do not plan to experiment with incremental solvers anytime soon I am fine with this direction as well. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1264 ↗ | (On Diff #148969) | Note that we are using lots of domain knowledge here like we have the most info about a symbol just before it dies. Also This optimization is a single lookup on the symbol level. I am not sure if Z3 could deal with this on the symbol level. It might need to do this on the constraint level. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2382 | @xazax.hun Right, I see. Now I don't think incremental solving would help, and I don't think that having a global solver object would be helpful for it. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1249 ↗ | (On Diff #148969) | @rnkovacs that's possibly valid (though the exact behavior might need to be behind an option), but the current implementation is wrong and the decision should be made at a different stack level. |
1259 ↗ | (On Diff #148969) | @rnkovacs right, so that's an optimization not to add extra constraints?
|
1264 ↗ | (On Diff #148969) |
What do you mean? I'm positive adding redundant constraints to Z3 would not slow solving down.
"Premature optimization is a root of (most) evil". |
1278 ↗ | (On Diff #148969) | @rnkovacs right, so this is a workaround for an existing bug? In that case a FIXME with a link to the revision you have mentioned should be added. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2382 | Just a bit of context and to have some expectation management regarding this patch. The main purpose of this implementation was to back a thesis. It was made under a very serious time pressure and the main goal was to be able to measure on real world projects as soon as possible and in the meantime to be flexible so we can measure multiple configurations (like incremental solving). So the goal was a flexible proof of concept that is sensible to measure in the shortest possible time. After the thesis was done, Reka started to work an another GSoC project, so she had no time to review the code with the requirements of upstreaming in mind. Nevertheless we found that sharing the proof of concept could be useful for the community. So it is perfectly reasonable if you disagree with some design decisions behind this patch, because the requirements for the thesis (in the short time frame) was very different from the requirements of upstreaming this work. In a different context these decisions made perfect sense. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1259 ↗ | (On Diff #148969) | We would not just add the very same constraints over and over again. We would first add the strongest possible constraint that the analyzer could infer first and later on add weaker and weaker ones. Does Z3 do some special handling for that as well? |
1264 ↗ | (On Diff #148969) | While I do agree that we should not optimize prematurely this is not just an optimization. Having a minimal set of constraints is also useful for debugging when dumping the set of constraints that Z3 tries to solve. Also, I think this optimization is quite simple, so I do not see removing it making the code much simpler. This is the reason why I would live to see some performance numbers first. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2382 | @xazax.hun of course. My comments are for @mikhail.ramalho who is now working on this patch. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
1259 ↗ | (On Diff #148969) | @xazax.hun Since all subformulas are hash-consed, I would be extremely surprised if this heuristic affected performance in a large way. However, your approach could be still useful in order to get readable dumps from the solver. // a comment explaining the logic if (State.succ_size() == 0 || State.getLocation().isPurged()) solver.addConstraints(state) |
1264 ↗ | (On Diff #148969) | OK your second point is valid, I think I've replied to it in my comment above. |
@xazax.hun (I'll reply here to avoid scattering the conversation across many subtrees)
I was thinking about the optimization for not adding redundant constraints some more, and I've decided I'm still against it ---
we are creating a higher potential for bugs, and we are tightly coupling the visitor to an internal implementation detail (all formulas are eventually purged at purge locations),
which creates a more fragile code.
The proper way to do this would be to have a set of constraints, and then add all constraints there as we iterate through the states (and through constraints inside the state).
If we use the hashing function provided by Z3, the simple act of construction of a set would implicitly drop all redundant constraints.
@mikhail.ramalho In any case, the discussion here just further highlights that this optimization should be dropped from the initial patch, and if anything applied in a subsequent revision.
I am not not sure that I got the idea what are you suggesting here. If we have the constraint of for example a symbol s > 10 and later on a path we discover s > 20, will we also deduplicate this that way?
(Since the visitor is running backward we will add s > 20 constraint first, but this should be irrelevant for the deduplication I guess.)
@mikhail.ramalho In any case, the discussion here just further highlights that this optimization should be dropped from the initial patch, and if anything applied in a subsequent revision.
Seams reasonable.
I am not not sure that I got the idea what are you suggesting here. If we have the constraint of for example a symbol s > 10 and later on a path we discover s > 20, will we also deduplicate this that way?
No. But I thought in your optimization atoms inside the constraints would be the same?
Could you give an example where they are not?
So the logic in the current patch would be the following. When the symbol s dies it will be cleaned up from the state. For each symbol we will find the state where it was cleaned up. We will add the constraint for that symbol from the state before the cleanup which will contain the constraint s > 20. This is the only point where we add the constraints regarding the symbol s, so s > 10 later on while we are traversing the path backwards will not be added.
Code to trigger this behavior:
void f(int s) { if (s > 10) { // ... if (s > 20) { // trigger a warning } } }
So the point of this optimization is to only add the ranges of a symbol once, where we have the most information about it. So strictly speaking it is not a deduplication on the constraint level but on the symbol level.
So strictly speaking it is not a deduplication on the constraint level but on the symbol level.
Right, apologies, I was initially mistaken then.
That's not even deduplication, I would call it using the interval solver to guide the constraint selection for the SMT solver.
That makes sense, but I'm worried about tight coupling between different features, and classes of bugs which may arise due to that.
It would be great to have it in a separate patch then.
- Simplified refutation process: it now collects all the constraints in a given path and, only when it reaches the root node, the refutation manager is created and the constraints are checked for reachability. All the optimizations were removed.
- Moved RangedConstraintManager.h to include/
- Moved refutation check to be the first in the list of BugVisitors
- Added dump method to Z3Solver (to print the formula)
- Added more documentation/comments
Hi,
Just a bit of context and to have some expectation management regarding
this patch. The main purpose of this implementation was to back a thesis.
It was made under a very serious time pressure and the main goal was to be
able to measure on real world projects as soon as possible and in the
meantime to be flexible so we can measure multiple configurations (like
incremental solving).So the goal was a flexible proof of concept that is sensible to measure in
the shortest possible time. After the thesis was done, Reka started to work
an another GSoC project, so she had no time to review the code with the
requirements of upstreaming in mind. Nevertheless we found that sharing the
proof of concept could be useful for the community. So it is perfectly
reasonable if you disagree with some design decisions behind this patch,
because the requirements for the thesis (in the short time frame) was very
different from the requirements of upstreaming this work. In a different
context these decisions made perfect sense.
Just want to comment here and give thanks again for the first version of
the refutation code. It's being really helpful to develop the approach this
code as a base; things would definitely be slower if I had to start it from
scratch.
Thanks!
@mikhail.ramalho Thanks for this note, it's very nice of you :)
I'm glad if it saves a bit of time, but it's only a rough sketch, so please feel free to tailor it to your liking (and the reviewers' of course).
Thanks, this is going in the right direction!
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h | ||
---|---|---|
182 ↗ | (On Diff #149317) | We don't need reset anymore. |
183 ↗ | (On Diff #149317) | Making those virtual does not make much sense to me. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
2378 |
| |
2392 | That would be checking all constraints in all nodes one by one. I thought the idea was to encode all constraints from the entire path and then check all of it. | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
889 ↗ | (On Diff #149317) | 👍 |
925 ↗ | (On Diff #149317) | We don't need reset anymore. |
928 ↗ | (On Diff #149317) | The semantics of this method is incorrect. It should return a tri-value somehow (e.g. Optional<bool>, and then higher-level logic in visitor should decide what to do with it.) |
1272 ↗ | (On Diff #149317) | Since https://reviews.llvm.org/D47603 has landed we should drop this branch. |
1282 ↗ | (On Diff #149317) | /*RetTy=*/nullptr |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
---|---|---|
928 ↗ | (On Diff #149317) | We could also use ConditionTruthVal for this purpose. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2392 | All the constraints are being added in the previous for loop, isModelFeasible only calls check(). | |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
925 ↗ | (On Diff #149317) | We don't need it but there's no reason to remove it, right? I might be useful in the future. |
lib/StaticAnalyzer/Core/Z3ConstraintManager.cpp | ||
---|---|---|
925 ↗ | (On Diff #149317) | We try to keep the code as small and as simple as possible so that it still achieves the task -- under that logic, unused methods should not be added. |
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h | ||
---|---|---|
183 ↗ | (On Diff #149317) | Z3ConstraintManager is fully contained inside a .cpp file, so we need isModelFeasible and addRangeConstraints to be exposed via its base class. Another solution is to split Z3ConstraintManager into a .h and a .cpp file and include the header. We would then be able to use it directly, instead of through a ConstraintManager object. I honestly prefer the latter. What do you think? |
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h | ||
---|---|---|
183 ↗ | (On Diff #149317) | Yeah, I think we would need a header here. |
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
2392 | Ah right, I see we are inside of the branch when pred_size() == 0. |
include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h | ||
---|---|---|
183 ↗ | (On Diff #149317) | Cool, I'll create a separate patch for that then. |
@mikhail.ramalho I assume you know it, but just in case, you can mark dependencies in phabricator by adding "parent" revisions.
- Simplified the API even further by constructing a Z3ConstraintManager object directly.
- Update isModelFeasible to return a isModelFeasible
- Update code with the fix for 1-bit long integer
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2366 | I'm not happy about this cast. Suggestions are welcome. |
I updated the test case as the cross check is not marking the true bug as invalid anymore.
Awesome! Does it mean that the optimization for adding less constraints was in fact buggy?
My make clang-test is still failing Driver/response-file.c whenever I compile clang with Z3. I'll update the patch as soon as I find the reason why.
- You shouldn't use make, use ninja (also make sure you use gold linker, default linker takes forever on Linux)
- Could it be something unrelated to your changes? Any given trunk version can be buggy, but usually those are resolved very fast, so if you update now the issue can go away.
Watching cfe-commits mailing list might be helpful there.
Otherwise looks good apart from minor naming nit! I guess we could figure out the casting issue later.
lib/StaticAnalyzer/Core/BugReporterVisitors.cpp | ||
---|---|---|
2360 | we would need a more descriptive name, e.g. isUnfeasible or similar. | |
2366 | well yeah, CreateZ3ConstraintManager should return an SMTConstraintManager. |
Awesome! Does it mean that the optimization for adding less constraints
was in fact buggy?
I pretty sure it was not related to the optimizations, I removed them days
ago (in the previous version of this patch) and the bug was still there.
- Could it be something unrelated to your changes? Any given trunk
version can be buggy, but usually those are resolved very fast, so if you
update now the issue can go away.Watching cfe-commits mailing list might be helpful there.
I update my repo every other day and it's been happening for the past
two/three weeks :/
The compiler shows the following error:
posix_spawn failed: Argument list too long
There are some discussions in several places about it.
I pretty sure it was not related to the optimizations, I removed them days
ago (in the previous version of this patch) and the bug was still there.
OK so any idea what the change could have been? Clearly the bug was there but not now. Anyway, should be OK to commit now.
I update my repo every other day and it's been happening for the past
two/three weeks :/
If it happens with your patch reverted as well, then it's unrelated, and we should just commit.
The option name should be more self-explanatory, post-processing in general can mean anything