Make the SimpleSValBuilder to be able to look up and use a constraint
for an operand of a SymbolCast, when the operand is constrained to a
const value.
This part of the SValBuilder is responsible for constant folding. We
need this constant folding, so the engine can work with less symbols,
this way it can be more efficient. Whenever a symbol is constrained with
a constant then we substitute the symbol with the corresponding integer.
If a symbol is constrained with a range, then the symbol is kept and we
fall-back to use the range based constraint manager, which is not that
efficient. This patch is the natural extension of the existing constant
folding machinery with the support of SymbolCast symbols.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp | ||
---|---|---|
1359 | Hoist this common subexpression. | |
clang/test/Analysis/svalbuilder-casts.cpp | ||
10–11 | Please ass a test case where these events happen in the execution path:
| |
31 | ||
35–38 | clang_analyzer_eval(x == 0); // expected-warning {{UNKNOWN}} | |
44 | ||
45–46 | It took me a while to get that the short variable has nothing to do with the test. | |
47–50 | clang_analyzer_eval(x == 1); // expected-warning {{UNKNOWN}} clang_analyzer_eval(x == -1); // expected-warning {{UNKNOWN}} clang_analyzer_eval(x == 0); // expected-warning {{FALSE}} clang_analyzer_eval(x == 65537); // expected-warning {{FALSE}} clang_analyzer_eval(x == -65537); // expected-warning {{FALSE}} clang_analyzer_eval(x == 131073); // expected-warning {{FALSE}} clang_analyzer_eval(x == -131073); // expected-warning {{FALSE}} |
- Hoist S->getOperand, rework the test file
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp | ||
---|---|---|
1359 | Ok. | |
clang/test/Analysis/svalbuilder-casts.cpp | ||
10–11 | Here is the test you'd like. void testA(int x) { short s = x; assert(x > 0); clang_analyzer_eval(s > 0); // expected-warning{{UNKNOWN}} should be TRUE } However, this will not pass, because x is constrained with a range, and in the Simplifier we do constant folding, i.e. the range must collapse to a single value. | |
45–46 | Ok, I added a hunk for the implicit cast. | |
47–50 | Thanks, I've added these cases.
Actually, 1 and -1 does not cast to 0 as short, so these should be FALSE. Updated like so. |
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp | ||
---|---|---|
1354 | Should this imply to use the root symbol and not the second nested one? |
This part of the SValBuilder is responsible for constant folding. We need this constant folding, so the engine can work with less symbols, this way it can be more efficient. Whenever a symbol is constrained with a constant then we substitute the symbol with the corresponding integer. If a symbol is constrained with a range, then the symbol is kept and we fall-back to use the range based constraint manager, which is not that efficient. This patch is the natural extension of the existing constant folding machinery with the support of SymbolCast symbols.
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp | ||
---|---|---|
1354 | We need the operand, with your words it is the second nested one, not the root. In case of (int)(short)(x) we need the (short)(x). |
I've just investigated this patch deeply. I think we have wrong logic here.
Assume we have (int)(short)(int x). VisitSymbolCast will try to get the constant recursively in the next order:
- (short)(int x)
- (int x)
And here is my concern. Whether it is correct to get the range for int x and consider it as a correct simplification of (int)(short)(int x). IMO it can't be simplified like that. It should go through the set of conventions and intersections.
Working on an integral cast feature I've encountered in the situation when I can get the range for (short)(int x) in the chain above. And it logically matches with the original symbol (int)(short)(int x). After that, your solution (this patch) stops and returns this range, missing the range associated with int x. That borns the problem:
this patch | after my patch |
(int)(short)(int x) = 0 | (int)(short)(int x) = 0 |
(short)(int x) = ? / go next | (short)(int x) = 0 / match |
(int x) = 1 / match | (int x) = 1 / doesn't reach because of the previous match |
result | result |
0 and 1 infeasible | 0 and 0 feasible |
I think we have to improve this behavior.
Assume we have (int)(short)(int x). VisitSymbolCast will try to get the constant recursively in the next order:
- (short)(int x)
- (int x)
And here is my concern. Whether it is correct to get the range for int x and consider it as a correct simplification of (int)(short)(int x). IMO it can't be simplified like that. It should go through the set of conventions and intersections.
I agree that it is not correct to do that if there is a range associated for the castee symbol, in those cases we should consult with the range based solver. However, this patch handles the case when a simple value (a range with exactly one element) is associated to the castee. And in that case, this ought to be correct, the cast is handled properly via these functions: SValBuilder::evalCast -> EvalCastVisitor::VisitNonLocConcreteInt -> APSIntType::apply.
Generally, the Simplifier::Visit functions should return a simplified symbol only if during the visitation we could find a constant value associated to any of the visited nodes, it does constant folding. If we simplify with a symbol that is constrained into a range and not into a simple value, then there is a bug, we should fix that.
@martong wrote:
I think you did get it. I'm not talking about range but about concrete int. And I see that the current behavior needs an improvement or rework.
Consider next code, which I used in my example:
1. void test(int x) { 2. assert((short)x == 0); 3. clang_analyzer_eval(x == 1); 4. }
Your patch does next:
- on the line 3 we know that (int)(short)(int x) = 0 and (int x) = 1
- simplify (int)(short)(int x). Try to get a constant for (short)(int x). Result is nothing because it is not presented in the range map. Continue unwrapping.
- go deeper and simplify (short)(int x). Try to get a constant for (int x). Result is 1. Stop visiting.
- return 1.
- intersect the range of the original symbol (int)(short)(int x) which is 0 and the range which is returned - 1
- result is an infeasible state.
My patch above yours does next:
- on the line 3 we know that (int)(short)(int x) = 0 and (int x) = 1, but now we also know that (short)(int x) = 0 as an equivalent for (int)(short)(int x) due to the improvement.
- simplify (int)(short)(int x). Try to get a constant for (short)(int x). Result is 0. Stop visiting.
- return 0.
- intersect the range of the original symbol (int)(short)(int x) which is 0 and the range which is returned - 0
- result is a feasible state.
Here what I'm saying. This is not about ranges. This simplification dosn't take into account that different operands of the cast symbol may have different associated ranges or constants. And what do we have to do with them in this case?
Yeah okay. I get it now. Thank you for your patience and your time on elaborating the issue.
First, I think we'd need to fabricate a test case that shows us the bug even without applying your patch (D103096).
Then we can iterate onto the solution. What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096. But then there is an open question: what should we do if there is another kind of symbol in the chain of SymbolCasts? E.g SymbolCast, UnarySymExpr, SymbolCast.
Suppose we have found the way to handle it. But what if we find a contradiction while simplifying, like (short)(int x) = 0 and (int x) = 1. So that we would already know that it is impossible. What SVal should we return in such case? Undef? Unknown? Original one?
But then there is an open question: what should we do if there is another kind of symbol in the chain of SymbolCasts? E.g SymbolCast, UnarySymExpr, SymbolCast.
IMO, it doesn't matter, since we are just waiting for a constant. Doesn't matter to what Sym it is associated. The main question, as I said, what we shall do with two different constants.
Then we can iterate onto the solution. What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096.
Suppose we have found the way to handle it. But what if we find a contradiction while simplifying, like (short)(int x) = 0 and (int x) = 1. So that we would already know that it is impossible. What SVal should we return in such case? Undef? Unknown? Original one?
But then there is an open question: what should we do if there is another kind of symbol in the chain of SymbolCasts? E.g SymbolCast, UnarySymExpr, SymbolCast.
IMO, it doesn't matter, since we are just waiting for a constant. Doesn't matter to what Sym it is associated. The main question, as I said, what we shall do with two different constants.
Do you have any ideas?
Suppose we have found the way to handle it. But what if we find a contradiction while simplifying, like (short)(int x) = 0 and (int x) = 1. So that we would already know that it is impossible. What SVal should we return in such case? Undef? Unknown? Original one?
I think the SValBuilder should not reach such an ambiguity. Ideally, both (short)(int x) and (int x) should be constrained to the same value. If that is not the case then we already have an infeasible state. But it should not be the task of the SValBuilder to recognize this situation. So, when we add the second constraint then we should notice the contradiction immediately. This check should be done in the ConstraintAssignor.
What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096
I don't think anymore that this is would be the way forward, as I said, let's do the check for infeasibility in the ConstraintAssignor.
Actually, you already have a logic for checking if there is a contradiction in your D103096 patch, in ConstraintAssignor::updateExistingConstraints:
// Update constraints in the map which bitwidth is equal or lower then // `MinBitWidth`. if (CM) { for (auto &Item : *CM) { // Stop after reaching a bigger bitwidth. if (Item.first > MinBitWidth) break; RangeSet RS = RangeFactory.intersect(Item.second, CastTo(Item.first)); // If the intersection is empty, then the branch is infisible. if (RS.isEmpty()) { State = nullptr; return false; } NewCM = CMF.add(NewCM, Item.first, RS); } }
So, the intersection should be empty in the above mentioned ambiguous case, shouldn't' it?
So, the intersection should be empty in the above mentioned ambiguous case, shouldn't' it?
No. My current implementation doesn't contain this check in ConstraintAssignor in favor of the solution simplification. It'll be added in the next patches.
But still I think, this solution is not accomplished. Consider next. Say, symbol (char)(int x) is associated to the range [42, 42], and can be simplified to constant 42 of char. And (int x) is associated to the range [43, 43] Your patch will omit looking into the symbol itself and unwrap it starting visiting the operand instead. Thus, it will return the constant 43 for (int x).
Moreover, if (int x) is 43, it will contradict to 42 (aka infeasible state). We also have no decision what to return in this case.
For me, this is really uncertain patch that I'm not 100% sure in.
I don't see this case different to the unary expressions. Consider the unary minus for example. Let's say, the symbol a is constrained to [1, 1] and then -a is constrained to [-2, -2]. This is clearly an infeasible state and the analyzer will terminate the path. So, no further call of SValBuilder should happen from that point. That is, it is meaningless to evaluate -(-a) if there is a contradiction in the constraints that are assigned to the sub-symbols of the the symbol tree of -(-a).
Here is a test case that demonstrates this (this passes with latest llvm/main):
// RUN: %clang_analyze_cc1 %s \ // RUN: -analyzer-checker=core \ // RUN: -analyzer-checker=debug.ExprInspection \ // RUN: -analyzer-config eagerly-assume=false \ // RUN: -verify extern void abort() __attribute__((__noreturn__)); #define assert(expr) ((expr) ? (void)(0) : abort()) void clang_analyzer_warnIfReached(); void clang_analyzer_eval(int); void test(int a, int b) { assert(-a == -1); assert(a == 1); clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}} clang_analyzer_eval(-(-a) == 1); // expected-warning{{TRUE}} assert(-b <= -2); assert(b == 1); clang_analyzer_warnIfReached(); // unreachable!, no-warning clang_analyzer_eval(-(-b) == 1); // no-warning }
Should this imply to use the root symbol and not the second nested one?
E.g. from (int)(short)(x) do you want (short)(x) or (x)?
getOperand gives you (short)(x) in this case.