Make the SimpleSValBuilder to be able to look up and use a constraint

for an operand of a SymbolCast, when the operand is constrained to a

const value.

This part of the SValBuilder is responsible for constant folding. We

need this constant folding, so the engine can work with less symbols,

this way it can be more efficient. Whenever a symbol is constrained with

a constant then we substitute the symbol with the corresponding integer.

If a symbol is constrained with a range, then the symbol is kept and we

fall-back to use the range based constraint manager, which is not that

efficient. This patch is the natural extension of the existing constant

folding machinery with the support of SymbolCast symbols.

# Details

# Diff Detail

- Repository
- rG LLVM Github Monorepo

### Event Timeline

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp | ||
---|---|---|

1359 | Hoist this common subexpression. | |

clang/test/Analysis/svalbuilder-casts.cpp | ||

10–11 | Please ass a test case where these events happen in the execution path: - Have an unconstrained variable, such as a fn parameter
`x`. - Produce a Symbolic cast of
`x`, let's call it`s`. - Constrain
`x`to some value range, let's say it must be positive. - Verify that the value of the Symbolic cast
`s`is also constrained to be positive.
| |

31 | ||

35–38 |
| |

44 | ||

45–46 | It took me a while to get that the | |

47–50 | clang_analyzer_eval(x == 1); // expected-warning {{UNKNOWN}} clang_analyzer_eval(x == -1); // expected-warning {{UNKNOWN}} clang_analyzer_eval(x == 0); // expected-warning {{FALSE}} clang_analyzer_eval(x == 65537); // expected-warning {{FALSE}} clang_analyzer_eval(x == -65537); // expected-warning {{FALSE}} clang_analyzer_eval(x == 131073); // expected-warning {{FALSE}} clang_analyzer_eval(x == -131073); // expected-warning {{FALSE}} |

- Hoist S->getOperand, rework the test file

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp | ||
---|---|---|

1359 | Ok. | |

clang/test/Analysis/svalbuilder-casts.cpp | ||

10–11 | Here is the test you'd like. void testA(int x) { short s = x; assert(x > 0); clang_analyzer_eval(s > 0); // expected-warning{{UNKNOWN}} should be TRUE } However, this will not pass, because | |

45–46 | Ok, I added a hunk for the implicit cast. | |

47–50 | Thanks, I've added these cases.
Actually, |

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp | ||
---|---|---|

1354 | Should this imply to use the root symbol and not the second nested one? |

This part of the SValBuilder is responsible for **constant folding**. We need this constant folding, so the engine can work with less symbols, this way it can be more efficient. Whenever a symbol is constrained with a constant then we substitute the symbol with the corresponding integer. If a symbol is constrained with a range, then the symbol is kept and we fall-back to use the range based constraint manager, which is not that efficient. This patch is the natural extension of the existing constant folding machinery with the support of `SymbolCast` symbols.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp | ||
---|---|---|

1354 | We need the operand, with your words it is the second nested one, not the root. In case of |

I've just investigated this patch deeply. I think we have wrong logic here.

Assume we have `(int)(short)(int x)`. `VisitSymbolCast` will try to get the constant recursively in the next order:

`(short)(int x)``(int x)`

And here is my concern. Whether it is correct to get the range for `int x` and consider it as a correct simplification of `(int)(short)(int x)`. IMO it can't be simplified like that. It should go through the set of conventions and intersections.

Working on an integral cast feature I've encountered in the situation when I can get the range for `(short)(int x)` in the chain above. And it logically matches with the original symbol `(int)(short)(int x)`. After that, your solution (this patch) stops and returns this range, missing the range associated with `int x`. That borns the problem:

this patch | after my patch |

(int)(short)(int x) = 0 | (int)(short)(int x) = 0 |

(short)(int x) = ? / go next | (short)(int x) = 0 / match |

(int x) = 1 / match | (int x) = 1 / doesn't reach because of the previous match |

result | result |

0 and 1 infeasible | 0 and 0 feasible |

I think we have to improve this behavior.

Assume we have

(int)(short)(int x).VisitSymbolCastwill try to get the constant recursively in the next order:

(short)(int x)(int x)And here is my concern. Whether it is correct to get the range for

int xand consider it as a correct simplification of(int)(short)(int x). IMO it can't be simplified like that. It should go through the set of conventions and intersections.

I agree that it is not correct to do that if there is a **range** associated for the castee symbol, in those cases we should consult with the range based solver. However, this patch handles the case when a **simple** value (a range with exactly one element) is associated to the castee. And in that case, this ought to be correct, the cast is handled properly via these functions: `SValBuilder::evalCast` -> `EvalCastVisitor::VisitNonLocConcreteInt` -> `APSIntType::apply`.

Generally, the `Simplifier::Visit` functions should return a simplified symbol only if during the visitation we could find a constant value associated to any of the visited nodes, it does constant folding. If we simplify with a symbol that is constrained into a range and not into a simple value, then there is a bug, we should fix that.

@martong wrote:

I think you did get it. I'm not talking about range but about concrete int. And I see that the current behavior needs an improvement or rework.

Consider next code, which I used in my example:

1. void test(int x) { 2. assert((short)x == 0); 3. clang_analyzer_eval(x == 1); 4. }

Your patch does next:

- on the line 3 we know that
`(int)(short)(int x) = 0`and`(int x) = 1` - simplify
`(int)(short)(int x)`. Try to get a constant for`(short)(int x)`. Result is nothing because it is not presented in the range map.**Continue**unwrapping. - go deeper and simplify
`(short)(int x)`. Try to get a constant for`(int x)`. Result is 1.**Stop**visiting. - return 1.
- intersect the range of the original symbol
`(int)(short)(int x)`which is**0**and the range which is returned -**1** - result is an
**infeasible**state.

My patch above yours does next:

- on the line 3 we know that
`(int)(short)(int x) = 0`and`(int x) = 1`, but now we also know that`(short)(int x) = 0`as an equivalent for`(int)(short)(int x)`due to the improvement. - simplify
`(int)(short)(int x)`. Try to get a constant for`(short)(int x)`. Result is 0.**Stop**visiting. - return 0.
- intersect the range of the original symbol
`(int)(short)(int x)`which is**0**and the range which is returned -**0** - result is a
**feasible**state.

Here what I'm saying. This is not about ranges. This simplification dosn't take into account that different operands of the cast symbol may have different associated ranges or constants. And what do we have to do with them in this case?

Yeah okay. I get it now. Thank you for your patience and your time on elaborating the issue.

First, I think we'd need to fabricate a test case that shows us the bug even without applying your patch (D103096).

Then we can iterate onto the solution. What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096. But then there is an open question: what should we do if there is another kind of symbol in the chain of SymbolCasts? E.g SymbolCast, UnarySymExpr, SymbolCast.

Suppose we have found the way to handle it. But what if we find a contradiction while simplifying, like `(short)(int x) = 0` and `(int x) = 1`. So that we would already know that it is impossible. What `SVal` should we return in such case? Undef? Unknown? Original one?

But then there is an open question: what should we do if there is another kind of symbol in the chain of SymbolCasts? E.g SymbolCast, UnarySymExpr, SymbolCast.

IMO, it doesn't matter, since we are just waiting for a constant. Doesn't matter to what `Sym` it is associated. The main question, as I said, what we shall do with two different constants.

Then we can iterate onto the solution. What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096.

Suppose we have found the way to handle it. But what if we find a contradiction while simplifying, like `(short)(int x) = 0` and `(int x) = 1`. So that we would already know that it is impossible. What `SVal` should we return in such case? Undef? Unknown? Original one?

But then there is an open question: what should we do if there is another kind of symbol in the chain of SymbolCasts? E.g SymbolCast, UnarySymExpr, SymbolCast.

IMO, it doesn't matter, since we are just waiting for a constant. Doesn't matter to what `Sym` it is associated. The main question, as I said, what we shall do with two different constants.

Do you have any ideas?

Suppose we have found the way to handle it. But what if we find a contradiction while simplifying, like (short)(int x) = 0 and (int x) = 1. So that we would already know that it is impossible. What SVal should we return in such case? Undef? Unknown? Original one?

I think the `SValBuilder` should not reach such an ambiguity. Ideally, both `(short)(int x)` and `(int x)` should be constrained to the same value. If that is not the case then we already have an infeasible state. But it should not be the task of the SValBuilder to recognize this situation. So, when we add the second constraint **then** we should notice the contradiction immediately. This check should be done in the `ConstraintAssignor`.

What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096

I don't think anymore that this is would be the way forward, as I said, let's do the check for infeasibility in the ConstraintAssignor.

Actually, you already have a logic for checking if there is a contradiction in your D103096 patch, in ConstraintAssignor::updateExistingConstraints:

// Update constraints in the map which bitwidth is equal or lower then // `MinBitWidth`. if (CM) { for (auto &Item : *CM) { // Stop after reaching a bigger bitwidth. if (Item.first > MinBitWidth) break; RangeSet RS = RangeFactory.intersect(Item.second, CastTo(Item.first)); // If the intersection is empty, then the branch is infisible. if (RS.isEmpty()) { State = nullptr; return false; } NewCM = CMF.add(NewCM, Item.first, RS); } }

So, the intersection should be empty in the above mentioned ambiguous case, shouldn't' it?

So, the intersection should be empty in the above mentioned ambiguous case, shouldn't' it?

No. My current implementation doesn't contain this check in ConstraintAssignor in favor of the solution simplification. It'll be added in the next patches.

But still I think, this solution is not accomplished. Consider next. Say, symbol `(char)(int x)` is associated to the range `[42, 42]`, and can be simplified to constant `42 of char`. And `(int x)` is associated to the range `[43, 43]` Your patch will omit looking into the symbol itself and unwrap it starting visiting the operand instead. Thus, it will return the constant 43 for `(int x)`.

Moreover, if `(int x)` is 43, it will contradict to 42 (aka infeasible state). We also have no decision what to return in this case.

For me, this is really uncertain patch that I'm not 100% sure in.

I don't see this case different to the unary expressions. Consider the unary minus for example. Let's say, the symbol `a` is constrained to `[1, 1]` and then `-a` is constrained to `[-2, -2]`. This is clearly an infeasible state and the analyzer will terminate the path. So, no further call of SValBuilder should happen from that point. That is, it is meaningless to evaluate `-(-a)` if there is a contradiction in the constraints that are assigned to the sub-symbols of the the symbol tree of `-(-a)`.

Here is a test case that demonstrates this (this passes with latest llvm/main):

// RUN: %clang_analyze_cc1 %s \ // RUN: -analyzer-checker=core \ // RUN: -analyzer-checker=debug.ExprInspection \ // RUN: -analyzer-config eagerly-assume=false \ // RUN: -verify extern void abort() __attribute__((__noreturn__)); #define assert(expr) ((expr) ? (void)(0) : abort()) void clang_analyzer_warnIfReached(); void clang_analyzer_eval(int); void test(int a, int b) { assert(-a == -1); assert(a == 1); clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}} clang_analyzer_eval(-(-a) == 1); // expected-warning{{TRUE}} assert(-b <= -2); assert(b == 1); clang_analyzer_warnIfReached(); // unreachable!, no-warning clang_analyzer_eval(-(-b) == 1); // no-warning }

Should this imply to use the root symbol and not the second nested one?

E.g. from

(int)(short)(x)do you want(short)(x)or(x)?getOperandgives you(short)(x)in this case.