This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Handle SymbolCast in SValBuilder
Needs RevisionPublic

Authored by martong on May 26 2022, 8:44 AM.

Details

Summary

Make the SimpleSValBuilder to be able to look up and use a constraint
for an operand of a SymbolCast, when the operand is constrained to a
const value.
This part of the SValBuilder is responsible for constant folding. We
need this constant folding, so the engine can work with less symbols,
this way it can be more efficient. Whenever a symbol is constrained with
a constant then we substitute the symbol with the corresponding integer.
If a symbol is constrained with a range, then the symbol is kept and we
fall-back to use the range based constraint manager, which is not that
efficient. This patch is the natural extension of the existing constant
folding machinery with the support of SymbolCast symbols.

Diff Detail

Event Timeline

martong created this revision.May 26 2022, 8:44 AM
Herald added a project: Restricted Project. · View Herald Transcript
martong requested review of this revision.May 26 2022, 8:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2022, 8:44 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
steakhal added inline comments.May 26 2022, 9:59 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1359

Hoist this common subexpression.

clang/test/Analysis/svalbuilder-casts.cpp
10–11

Please ass a test case where these events happen in the execution path:

  1. Have an unconstrained variable, such as a fn parameter x.
  2. Produce a Symbolic cast of x, let's call it s.
  3. Constrain x to some value range, let's say it must be positive.
  4. Verify that the value of the Symbolic cast s is also constrained to be positive.
31
35–38

clang_analyzer_eval(x == 0); // expected-warning {{UNKNOWN}}

44
45–46

It took me a while to get that the short variable has nothing to do with the test.
I would recommend extending the previous example with the short variable to demonstrate that the same thing (narrowing cast) can occur by either an explicit cast or some implicit casts like a variable initialization/assignment.

47–50
clang_analyzer_eval(x == 1);       // expected-warning {{UNKNOWN}}
clang_analyzer_eval(x == -1);      // expected-warning {{UNKNOWN}}
clang_analyzer_eval(x == 0);       // expected-warning {{FALSE}}
clang_analyzer_eval(x == 65537);   // expected-warning {{FALSE}}
clang_analyzer_eval(x == -65537);  // expected-warning {{FALSE}}
clang_analyzer_eval(x == 131073);  // expected-warning {{FALSE}}
clang_analyzer_eval(x == -131073); // expected-warning {{FALSE}}
martong updated this revision to Diff 432588.May 27 2022, 9:34 AM
martong marked 7 inline comments as done.
  • Hoist S->getOperand, rework the test file
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1359

Ok.

clang/test/Analysis/svalbuilder-casts.cpp
10–11

Here is the test you'd like.

void testA(int x) {
  short s = x;
  assert(x > 0);
  clang_analyzer_eval(s > 0); // expected-warning{{UNKNOWN}} should be TRUE
}

However, this will not pass, because x is constrained with a range, and in the Simplifier we do constant folding, i.e. the range must collapse to a single value.
This test will pass with the child patch (created by Denys).

45–46

Ok, I added a hunk for the implicit cast.

47–50

Thanks, I've added these cases.

clang_analyzer_eval(x == 1); expected-warning {{UNKNOWN}}
clang_analyzer_eval(x == -1);
expected-warning {{UNKNOWN}}

Actually, 1 and -1 does not cast to 0 as short, so these should be FALSE. Updated like so.

@martong As you said, my solution D103096 suppose to pass these and more other tests cases. So how it will help in combination with my solution D103096?
Although, your patch is really simple but it's more like a plug then a real SymbolCast support, isn't it? I don't quite understand the motivation.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1354

Should this imply to use the root symbol and not the second nested one?
E.g. from (int)(short)(x) do you want (short)(x) or (x)?
getOperand gives you (short)(x) in this case.

martong marked an inline comment as done.May 30 2022, 2:49 AM

@martong As you said, my solution D103096 suppose to pass these and more other tests cases. So how it will help in combination with my solution D103096?
Although, your patch is really simple but it's more like a plug then a real SymbolCast support, isn't it? I don't quite understand the motivation.

This part of the SValBuilder is responsible for constant folding. We need this constant folding, so the engine can work with less symbols, this way it can be more efficient. Whenever a symbol is constrained with a constant then we substitute the symbol with the corresponding integer. If a symbol is constrained with a range, then the symbol is kept and we fall-back to use the range based constraint manager, which is not that efficient. This patch is the natural extension of the existing constant folding machinery with the support of SymbolCast symbols.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1354

We need the operand, with your words it is the second nested one, not the root. In case of (int)(short)(x) we need the (short)(x).

ASDenysPetrov accepted this revision.May 30 2022, 9:37 AM

This part of the SValBuilder is responsible for constant folding. We need this constant folding, so the engine can work with less symbols, this way it can be more efficient. Whenever a symbol is constrained with a constant then we substitute the symbol with the corresponding integer. If a symbol is constrained with a range, then the symbol is kept and we fall-back to use the range based constraint manager, which is not that efficient. This patch is the natural extension of the existing constant folding machinery with the support of SymbolCast symbols.

Now I see. Thanks. I also wouldn't mind if you add this explanation to the summary.

This revision is now accepted and ready to land.May 30 2022, 9:37 AM
martong edited the summary of this revision. (Show Details)May 31 2022, 11:32 PM
This revision was landed with ongoing or failed builds.May 31 2022, 11:42 PM
This revision was automatically updated to reflect the committed changes.
martong marked an inline comment as done.
ASDenysPetrov reopened this revision.EditedSep 26 2022, 9:50 AM

I've just investigated this patch deeply. I think we have wrong logic here.

Assume we have (int)(short)(int x). VisitSymbolCast will try to get the constant recursively in the next order:

  • (short)(int x)
  • (int x)

And here is my concern. Whether it is correct to get the range for int x and consider it as a correct simplification of (int)(short)(int x). IMO it can't be simplified like that. It should go through the set of conventions and intersections.
Working on an integral cast feature I've encountered in the situation when I can get the range for (short)(int x) in the chain above. And it logically matches with the original symbol (int)(short)(int x). After that, your solution (this patch) stops and returns this range, missing the range associated with int x. That borns the problem:

this patchafter my patch
(int)(short)(int x) = 0(int)(short)(int x) = 0
(short)(int x) = ? / go next(short)(int x) = 0 / match
(int x) = 1 / match(int x) = 1 / doesn't reach because of the previous match
resultresult
0 and 1 infeasible0 and 0 feasible

I think we have to improve this behavior.

This revision is now accepted and ready to land.Sep 26 2022, 9:50 AM
martong added a comment.EditedSep 27 2022, 2:17 AM

Assume we have (int)(short)(int x). VisitSymbolCast will try to get the constant recursively in the next order:

  • (short)(int x)
  • (int x)

And here is my concern. Whether it is correct to get the range for int x and consider it as a correct simplification of (int)(short)(int x). IMO it can't be simplified like that. It should go through the set of conventions and intersections.

I agree that it is not correct to do that if there is a range associated for the castee symbol, in those cases we should consult with the range based solver. However, this patch handles the case when a simple value (a range with exactly one element) is associated to the castee. And in that case, this ought to be correct, the cast is handled properly via these functions: SValBuilder::evalCast -> EvalCastVisitor::VisitNonLocConcreteInt -> APSIntType::apply.

Generally, the Simplifier::Visit functions should return a simplified symbol only if during the visitation we could find a constant value associated to any of the visited nodes, it does constant folding. If we simplify with a symbol that is constrained into a range and not into a simple value, then there is a bug, we should fix that.

ASDenysPetrov added a comment.EditedSep 27 2022, 9:44 AM

@martong wrote:
I think you did get it. I'm not talking about range but about concrete int. And I see that the current behavior needs an improvement or rework.
Consider next code, which I used in my example:

1. void test(int x) {
2.   assert((short)x == 0);
3.   clang_analyzer_eval(x == 1);
4. }

Your patch does next:

  • on the line 3 we know that (int)(short)(int x) = 0 and (int x) = 1
  • simplify (int)(short)(int x). Try to get a constant for (short)(int x). Result is nothing because it is not presented in the range map. Continue unwrapping.
  • go deeper and simplify (short)(int x). Try to get a constant for (int x). Result is 1. Stop visiting.
  • return 1.
  • intersect the range of the original symbol (int)(short)(int x) which is 0 and the range which is returned - 1
  • result is an infeasible state.

My patch above yours does next:

  • on the line 3 we know that (int)(short)(int x) = 0 and (int x) = 1, but now we also know that (short)(int x) = 0 as an equivalent for (int)(short)(int x) due to the improvement.
  • simplify (int)(short)(int x). Try to get a constant for (short)(int x). Result is 0. Stop visiting.
  • return 0.
  • intersect the range of the original symbol (int)(short)(int x) which is 0 and the range which is returned - 0
  • result is a feasible state.

Here what I'm saying. This is not about ranges. This simplification dosn't take into account that different operands of the cast symbol may have different associated ranges or constants. And what do we have to do with them in this case?

ASDenysPetrov requested changes to this revision.Sep 27 2022, 9:44 AM
This revision now requires changes to proceed.Sep 27 2022, 9:44 AM

Yeah okay. I get it now. Thank you for your patience and your time on elaborating the issue.

First, I think we'd need to fabricate a test case that shows us the bug even without applying your patch (D103096).

Then we can iterate onto the solution. What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096. But then there is an open question: what should we do if there is another kind of symbol in the chain of SymbolCasts? E.g SymbolCast, UnarySymExpr, SymbolCast.

Yeah okay. I get it now. Thank you for your patience and your time on elaborating the issue.

First, I think we'd need to fabricate a test case that shows us the bug even without applying your patch (D103096).

Then we can iterate onto the solution. What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096.

Suppose we have found the way to handle it. But what if we find a contradiction while simplifying, like (short)(int x) = 0 and (int x) = 1. So that we would already know that it is impossible. What SVal should we return in such case? Undef? Unknown? Original one?

But then there is an open question: what should we do if there is another kind of symbol in the chain of SymbolCasts? E.g SymbolCast, UnarySymExpr, SymbolCast.

IMO, it doesn't matter, since we are just waiting for a constant. Doesn't matter to what Sym it is associated. The main question, as I said, what we shall do with two different constants.

@martong

Then we can iterate onto the solution. What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096.

Suppose we have found the way to handle it. But what if we find a contradiction while simplifying, like (short)(int x) = 0 and (int x) = 1. So that we would already know that it is impossible. What SVal should we return in such case? Undef? Unknown? Original one?

But then there is an open question: what should we do if there is another kind of symbol in the chain of SymbolCasts? E.g SymbolCast, UnarySymExpr, SymbolCast.

IMO, it doesn't matter, since we are just waiting for a constant. Doesn't matter to what Sym it is associated. The main question, as I said, what we shall do with two different constants.

Do you have any ideas?

Suppose we have found the way to handle it. But what if we find a contradiction while simplifying, like (short)(int x) = 0 and (int x) = 1. So that we would already know that it is impossible. What SVal should we return in such case? Undef? Unknown? Original one?

I think the SValBuilder should not reach such an ambiguity. Ideally, both (short)(int x) and (int x) should be constrained to the same value. If that is not the case then we already have an infeasible state. But it should not be the task of the SValBuilder to recognize this situation. So, when we add the second constraint then we should notice the contradiction immediately. This check should be done in the ConstraintAssignor.

What we could do is to collect the constants and types on the way of the cast visitation and then apply the same logic that you have in D103096

I don't think anymore that this is would be the way forward, as I said, let's do the check for infeasibility in the ConstraintAssignor.

Actually, you already have a logic for checking if there is a contradiction in your D103096 patch, in ConstraintAssignor::updateExistingConstraints:

// Update constraints in the map which bitwidth is equal or lower then
// `MinBitWidth`.
if (CM) {
  for (auto &Item : *CM) {
    // Stop after reaching a bigger bitwidth.
    if (Item.first > MinBitWidth)
      break;
    RangeSet RS = RangeFactory.intersect(Item.second, CastTo(Item.first));
    // If the intersection is empty, then the branch is infisible.
    if (RS.isEmpty()) {
      State = nullptr;
      return false;
    }
    NewCM = CMF.add(NewCM, Item.first, RS);
  }
}

So, the intersection should be empty in the above mentioned ambiguous case, shouldn't' it?

@martong

So, the intersection should be empty in the above mentioned ambiguous case, shouldn't' it?

No. My current implementation doesn't contain this check in ConstraintAssignor in favor of the solution simplification. It'll be added in the next patches.

But still I think, this solution is not accomplished. Consider next. Say, symbol (char)(int x) is associated to the range [42, 42], and can be simplified to constant 42 of char. And (int x) is associated to the range [43, 43] Your patch will omit looking into the symbol itself and unwrap it starting visiting the operand instead. Thus, it will return the constant 43 for (int x).
Moreover, if (int x) is 43, it will contradict to 42 (aka infeasible state). We also have no decision what to return in this case.

For me, this is really uncertain patch that I'm not 100% sure in.

I don't see this case different to the unary expressions. Consider the unary minus for example. Let's say, the symbol a is constrained to [1, 1] and then -a is constrained to [-2, -2]. This is clearly an infeasible state and the analyzer will terminate the path. So, no further call of SValBuilder should happen from that point. That is, it is meaningless to evaluate -(-a) if there is a contradiction in the constraints that are assigned to the sub-symbols of the the symbol tree of -(-a).
Here is a test case that demonstrates this (this passes with latest llvm/main):

// RUN: %clang_analyze_cc1 %s \
// RUN:   -analyzer-checker=core \
// RUN:   -analyzer-checker=debug.ExprInspection \
// RUN:   -analyzer-config eagerly-assume=false \
// RUN:   -verify

extern void abort() __attribute__((__noreturn__));
#define assert(expr) ((expr) ? (void)(0) : abort())

void clang_analyzer_warnIfReached();
void clang_analyzer_eval(int);

void test(int a, int b) {
  assert(-a == -1);
  assert(a == 1);
  clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
  clang_analyzer_eval(-(-a) == 1); // expected-warning{{TRUE}}

  assert(-b <= -2);
  assert(b == 1);
  clang_analyzer_warnIfReached(); // unreachable!, no-warning
  clang_analyzer_eval(-(-b) == 1); // no-warning
}