a % b != 0 implies that a != 0 for any a and b. This patch
extends the ConstraintAssignor to do just that. In fact, we could do
something similar with division and in case of multiplications we could
have some other inferences, but I'd like to keep these for future
patches.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
I had to move the definition of ConstraintAssignor after the definition of RangeConstraintManager b/c I am using assumeSymNE in the new logic. Unfortunately, the diff does not show clearly the changes inside the moved hunk, so I try to indicate the important changes with a New code comment.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1740 | New code | |
1750–1764 | New code | |
1767–1770 | New code |
- Break out the movement of RangeConstraintManager into a parent patch, this way the diff here is clearly visible and makes the review easier.
Great work!
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1601–1606 | Hm, why don't we acquire RCM, Builder, F in the constructor? I'm expecting all of them to remain the same for all the assign() calls. | |
1613–1614 | Why is this not a const member function? | |
1615 | I think you should also implement a % b == 0 implies that a == 0. | |
1639–1641 | IMO we should pass a reference here, just like the rest of the parameters. | |
clang/test/Analysis/constraint-assignor.c | ||
19 | It's still mindboggling that we need to do this. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1601–1606 | My bad. It's a static method. Ignore this thread. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1613–1614 | IMO it's better to rename the function handleRemainderOp. Add a function description in comments above. | |
1619–1622 | Maybe make some more complex assumptions to cover complex LHS's? | |
1719 | ||
clang/test/Analysis/constraint-assignor.c | ||
31 | Add some nested cases like x % y % z == 0. |
clang/test/Analysis/constraint-assignor.c | ||
---|---|---|
19 | This is needed because of RemoveDeadBindings. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1619–1622 | Oh nice. | |
clang/test/Analysis/constraint-assignor.c | ||
19 | I know, I just wanted to highlight that in contrast to this test code, on real code we would not draw the same conclusion, since we reclaimed the constraints too early. And this is what worries me. |
- Handle a % b == 0 implies that a == 0.
- Add more test cases
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1613–1614 |
Because RCM.assumeSymNE is not const.
I agree, changed it. | |
1615 | Good point, I've just added that. | |
1619–1622 | State->assume goes through many higher level abstractions and finally calls assumeSymNE, so I think calling that would be a pessimization in this case. | |
clang/test/Analysis/constraint-assignor.c | ||
31 | Good point! I've added one such test case. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1619–1622 | I agree, but then you lose an internal simplification of LHS symbol. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1613–1614 |
I don't think it's an excuse: https://godbolt.org/z/nEcsozheq Aside from that, I would clarify that it will try to apply the following *two* assumptions: | |
1619 | ||
1623 | In case the Constraint might be zero, why do you constrain LHS so that it must be zero? | |
1626–1628 |
BWT the following lines are uncovered by tests: L1627, L1651, L1758
Please adjust your tests accordingly.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1619 | Since SymbolRef actually is already const. (using SymbolRef = const SymExpr *;) |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1613–1614 |
Yeah you're right. It has nothing to do with the non-constness of RCM.assumeSymNE, my bad. The problem is that we try to assign a new state to the member non-const State and assignment is non-const: No viable overloaded '=' /home/egbomrt/WORK/llvm4/git/llvm-project/llvm/include/llvm/ADT/IntrusiveRefCntPtr.h:188:23: note: candidate function not viable: 'this' argument has type 'const clang::ento::ProgramStateRef' (aka 'const IntrusiveRefCntPtr<const clang::ento::ProgramState>'), but method is not marked const
Ok, I've added those comments. | |
1619 | Fair enough. | |
1619–1622 | Well, I cannot come up any disadvantages of using`State->assume` other than some performance constraints. On the other hand, that could simplify the implementation by eliminating the need for RCM. I'll give it a try and perhaps will update the patch accordingly. | |
1623 | Ahh, very good catch, sharp eyes! This mistake slipped in when I extended the original logic with a % b == 0 implies that a == 0. Fixed it. |
Excellent! All lines are covered.
Great job.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1619 | They mean different things. What I wanted is to declare both the pointer and the pointee const. |
Ok. Let's see what the benefits it brings.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1619 | Oh, I see. const T * const |
According to our measurements, it has some effects but is probably difficult to draw clear conclusions.
By using the github/martong/constraint_assignor_rem, it seems like the runtime is within measurable errors, and the memory consumption and the total result count remained approximately the same.
It seems like we have more results since we more often have more concrete values instead of unsimplified symbols, which makes the analyzer 'smarter' and reports more problems. At least, that's my theory.
Sorry for not sharing directly the csa-testbench summary HTML, but that could contain sensitive information.
If you require, I could repeat the test and publish the results to the publicly reachable demo server to let you all inspect and verify my theory.
Additionally to my previous observation, a surprising amount of the new findings are of deadcode detections, and most of them there are loops.
Other than that, I've seen a true-positive report as well:
At line 'A', on the path where valid_modulus is assumed to be true, we now correctly constrain spec->modulus to zero. The only problem is that we never mention this in the bugreport xD
That being said, I would recommend introducing a bug-report visitor to complement this surprising behavior, OR somehow add a NoteTag to this transition where this operation gets evaluated.
I suspect we will need to tune the trackExpressionValue stuff to keep up, whenever we extend ConstraintAssignor.
After taking a look at the new findings we discovered, there is a logic error with this patch, actually a % b == 0 implies that a == 0 does not hold, one counter example is 10 % 2 == 0. Argh, probably we should be using Z3 next time to prove or disprove such things. Or perhaps other reviewers with strong math background could have had a look (knock knock @NoQ :).
- Remove the wrong inferrence of a % b == 0 implies that a == 0 and related test cases.
I see. Now it looks correct.
Next time we shall have a z3 proof about the theory.
A => B <=> not(A) or B. which is SAT only if A and not(B) UNSAT.
a = z3.BitVec('a', 32) b = z3.BitVec('b', 32) zero = z3.BitVecVal(0, 32) s = z3.Solver() s.add((a % b) != zero) s.add(a == zero) s.check() # reports UNSAT
Note: it requires the z3-solver pip package.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1618–1627 | How about using the family of ProgramState::isNonNull or ProgramState::isNull or RangeConstraintManager::checkNull functoins for this stuff? |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1618–1627 | I've been checking this and turend out that ProgramState::isNull does not modify the State (this is aligned with being a const member function). So, these functions do not "assume" anything, they can be used only to query some property of an SVal (or Symbol) from the State. However, this comment and your other previous comment made me to do further investigations towards exploiting the "assume" machinery better. The result is a new child patch, where we can handle "adjustments" as well. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1618–1627 | But I don't see you use the modified State in any way. Why it's important for you to change the State? | |
1618–1627 | Let me suggest possible changes. | |
1619–1620 | Howerver, put this line inside if-body below, since Zero isn't needed wherever else. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1618–1627 |
Actually, we do use it. The State we overwrite here is the member of the class ConstraintAssignor, it is not a local variable.
It is important, because we'd like to assign new information to the existing things we know (i.e. to the State). So, once we see a modulo then we can defer some extra constraints and that is done via the ConstraintAssignor. | |
1619–1620 | Thanks, that is correct. I am going to address this in the child patch. |
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
1618–1627 | OK. I see. Thanks :) |
Hm, why don't we acquire RCM, Builder, F in the constructor? I'm expecting all of them to remain the same for all the assign() calls.