We should track non-equivalency (disequality) in case of greater-then or
less-then assumptions.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
I came up with exactly the same fix! Great job!
I just wanted to refactor it and not having
if (New.isEmpty()) // this is infeasible assumption return nullptr; ProgramStateRef NewState = setConstraint(St, Sym, New); return trackNE(NewState, Sym, Int, Adjustment);
repeated in different places
After this "accepted" and a huge thank you 😄
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1349–1379 | I suggest to change these two functions this way, so we can avoid the same pattern in 4 different functions |
Thanks for the quick review!
I updated according to your suggestion, so we avoid the same pattern.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1349–1379 |
|
What are our options mitigating anything similar happening in the future?
This way any change touching the SymbolicRangeInferrer and any related parts of the analyzer seems to be way too fragile.
Especially, since we might want to add support for comparing SymSyms, just like we try to do in D77792.
What about changing the EXPENSIVE_CHECKS in the assume function in the following way:
Convert all range constraints into a Z3 model and check if that is UNSAT.
In that case, we would have returned a state with contradictions, so we would prevent this particular bug from lurking around to bite us later.
And another possibility could be to create a debug checker, which registers to the assume callback and does the same conversion and check.
This is more appealing to me in some way, like decouples the Z3 dependency from the ConstraintManager header.
Which approach should I prefer? @NoQ @vsavchenko @martong @xazax.hun @Szelethus
I like the second approach, i.e. to have a debug checker. But I don't see, how would this checker validate all constraints at the moment when they are added to the State. And if we don't check all constraints then we might end up checking a state that is invalid for a while (i.e. that might became invalid previously and not because of the lastly added constraint.) So, essentially, my gut feeling is that both approaches should be validating all newly added constraints against Z3. And that might be too slow, it would have the same speed as using z3 instead of the range based solver.
ProgramStateRef evalAssume(ProgramStateRef State, SVal Cond, bool Assumption) checker callback is called after any assume call.
So, we would have a State which has all? the constraints stored in the appropriate GDM. I'm not sure if we should aggregate all the constraints till this node - like we do for refutation. I think, in this case, we should just make sure that the current state does not have any contradiction.
Here is my proof-of-concept:
I've dumped the state and args of evalAssume for the repro example, pretending that we don't reach a code-path on the infeasable path to crash:
void avoidInfeasibleConstraintforGT(int a, int b) { int c = b - a; if (c <= 0) return; if (a != b) { clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}} return; } clang_analyzer_warnIfReached(); // eh, we will reach this..., #1 // These are commented out, to pretend that we don't find out that we are on an infeasable path... // a == b // if (c < 0) // ; }
We reach the #1 line, but more importantly, we will have the following state dumps:
evalAssume: assuming Cond: (reg_$1<int a>) != (reg_$0<int b>) to be true in state: "program_state": { "constraints": [ { "symbol": "(reg_$0<int b>) - (reg_$1<int a>)", "range": "{ [1, 2147483647] }" }, { "symbol": "(reg_$1<int a>) != (reg_$0<int b>)", "range": "{ [-2147483648, -1], [1, 2147483647] }" } ] } evalAssume: assuming Cond: (reg_$1<int a>) != (reg_$0<int b>) to be false in state: "program_state": { "constraints": [ { "symbol": "(reg_$0<int b>) - (reg_$1<int a>)", "range": "{ [1, 2147483647] }" }, { "symbol": "(reg_$1<int a>) != (reg_$0<int b>)", "range": "{ [0, 0] }" } ] }
As you can see, the latter is the infeasable path, and if we have had serialized the constraints and let the Z3 check it, we would have gotten that it's unfeasable.
At this point the checker can dump any useful data for debugging, and crash the analyzer to let us know that something really bad happened.
[...] And that might be too slow, it would have the same speed as using z3 instead of the range based solver.
Yes, it will probably freaking slow, but at least we have something.
I suggest to change these two functions this way, so we can avoid the same pattern in 4 different functions