This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
-
ExprEngineC.cpp
-
test/Analysis/
-
Analysis/
-
find-binop-constraints.cpp

Differential D102696

[Analyzer] Find constraints that are directly attached to a BinOp
AbandonedPublic

Authored by martong on May 18 2021, 9:34 AM.

Download Raw Diff

Details

Reviewers

steakhal
NoQ
vsavchenko
Szelethus

Summary

Currently, if we have a constraint on a binary operator expression and then we add
another constraint to any of the binop's sub-expression then the first
constraint is not considered by the engine:

int foo() {
  if (x + y != 0)
    return 0;
  clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}} OK.
  if (y != 0)
    return 0;
  clang_analyzer_eval(y == 0);     // expected-warning{{TRUE}} OK.
  clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}} FAILS!
}

Interestingly, if we constrain first y and then (x + y) then the
behaviour is what we'd normally expect:

int foo() {
  if (y != 0)
    return 0;
  if (x + y != 0)
    return 0;
  clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}} OK.
  clang_analyzer_eval(y == 0);     // expected-warning{{TRUE}} OK.
}

The discrepancy stems from the underlying implementation of the engine:
In the latter case, the Environment of the current ExplodedNode has a
(x + y) -> [0, 0] binding. However, in the first case we have a [y] -> 0
binding in the current ExplodedNode, and when we build the SVal for the
(x + y) expression we simply build up the [x + 0] SVal, i.e. we never
try to find the constraint for (x + y).

In this patch, I add a StmtVisitor that builds up an SVal that
constitutes only from LValue SVals. This SVal then can be used to query
the constraint manager, this way we can find the desired constraint.

Alternative implementation could have been to traverse back in the
Exploded Graph to the Node where we can find the original binding of the
x + y expression. This seemed to be a far from ideal approach as we may
not know how far should we traverse back.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	3,510 ms	x64 debian > libFuzzer.libFuzzer::entropic-scale-per-exec-time.test

Event Timeline

martong created this revision.May 18 2021, 9:34 AM

Herald added a reviewer: Szelethus. · View Herald TranscriptMay 18 2021, 9:34 AM

Herald added subscribers: ASDenysPetrov, gamesh411, dkrupp and 9 others. · View Herald Transcript

martong requested review of this revision.May 18 2021, 9:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 18 2021, 9:34 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Ok so the state has enough constraints. It's enough to know that $x + $y == 0 and $y == 0 in order to deduce that $x + 0 == 0. But you're saying that we don't know how to infer it so instead of making us able to infer it let's make us ask the solver easier questions. That's a terrible hack but at this point i think we might as well take it. @vsavchenko WDYT, what would it take to fix the solver instead?

In D102696#2766337, @NoQ wrote:

Ok so the state has enough constraints. It's enough to know that $x + $y == 0 and $y == 0 in order to deduce that $x + 0 == 0. But you're saying that we don't know how to infer it so instead of making us able to infer it let's make us ask the solver easier questions. That's a terrible hack but at this point i think we might as well take it. @vsavchenko WDYT, what would it take to fix the solver instead?

Actually, the problem is more subtle than that. The solver is already capable of solving such equations, the below test passes on the main branch:

int test(int x, int y) {
  if (y != 0)
    return 0;
  if (x + y != 0)
    return 0;
  clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
  clang_analyzer_eval(y == 0);     // expected-warning{{TRUE}}
  clang_analyzer_eval(x == 0);     // expected-warning{{TRUE}}
  return 0;
}

However, we can generalize the original problem, by using 3 variables. In this case no solver is capable of solving b/c we simply don't have enough equations:

int test(int x, int y, int z) {
  if (x + y + z != 0)
    return 0;
  if (z != 0)
    return 0;
  clang_analyzer_eval(x + y + z == 0);  // UNKNOWN !
  clang_analyzer_eval(z == 0);          // OK.
  return 0;
}

The crux of the problem is that we don't ask the right question from the solver because we are literally unable to ask the right question (as of main). The solver takes an SVal (more precisely a symbolic expression) and can give back the related ranges or a concrete value.
We can see that the constraints are there, and they are correct:

"constraints": [
   { "symbol": "reg_$1<int z>", "range": "{ [0, 0] }" },
   { "symbol": "(reg_$0<int x>) + (reg_$1<int y>) + (reg_$1<int z>)", "range": "{ [0, 0] }" }
 ],

However, we never try to query "(reg_$0<int x>) + (reg_$1<int y>) + (reg_$1<int z>)", instead we ask the solver separately with "(reg_$0<int x>)", "(reg_$0<int y>)" and with ""(reg_$0<int z>)". This is because of how we bind the SVals to the expressions and because of the "constant folding" like mechanism encoded in getKnownValue and in simplifySVal. And, we had bound the ConcreteInt 0 to the ImplicitCastExpr of 'z', previously and this fuels the "constant folding" mechanism (which is very useful in most cases but causes this logical error).

martong edited the summary of this revision. (Show Details)May 18 2021, 12:25 PM

Harbormaster completed remote builds in B105058: Diff 346189.May 18 2021, 1:13 PM

In D102696#2766765, @martong wrote:
In D102696#2766337, @NoQ wrote:

Ok so the state has enough constraints. It's enough to know that $x + $y == 0 and $y == 0 in order to deduce that $x + 0 == 0. But you're saying that we don't know how to infer it so instead of making us able to infer it let's make us ask the solver easier questions. That's a terrible hack but at this point i think we might as well take it. @vsavchenko WDYT, what would it take to fix the solver instead?

Actually, the problem is more subtle than that. The solver is already capable of solving such equations, the below test passes on the main branch:
int test(int x, int y) {
  if (y != 0)
    return 0;
  if (x + y != 0)
    return 0;
  clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
  clang_analyzer_eval(y == 0);     // expected-warning{{TRUE}}
  clang_analyzer_eval(x == 0);     // expected-warning{{TRUE}}
  return 0;
}

In this case the equations are $y == 0 and $x + 0 == 0 which is much easier to solve. This happens for the same reason that your patch is needed in the first place: we're eagerly substituting a constant.

manas added a subscriber: manas.May 18 2021, 5:35 PM

OikawaKirie added a subscriber: OikawaKirie.May 18 2021, 7:52 PM

In this case the equations are $y == 0 and $x + 0 == 0 which is much easier to solve.

Yes, you are right.

This happens for the same reason that your patch is needed in the first place: we're eagerly substituting a constant.

Absolutely, that's the point. On the other hand, it is very important to emphasize that we cannot solve this problem with a stronger solver, see my example with 3 variables and two equations above.

Also, another interesting question is (drawn up by @steakhal) if we can/could handle associativity somehow. Commutativity seems to be handled by the constraint manager (alas we'd search for "y + x" when we query "x + y").

My take on this: for whatever approach, we definitely will be able to construct a more nested/complex example so that it doesn't work.
For this patch, I'm wondering about something like this:

int foo() {
  if (z != 0)
    return 0;
  if (x + y + z != 0)
    return 0;
  clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}} OK.
  if (y != 0)
    return 0;
  clang_analyzer_eval(y == 0);         // expected-warning{{TRUE}} OK.
  clang_analyzer_eval(z == 0);         // expected-warning{{TRUE}} OK.
  clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}} ?
}

As for the solver, it is something that tormented me for a long time. Is there a way to avoid a full-blown brute force check of all existing constraints and get some knowledge about symbolic expressions by constraints these symbolic expressions are actually part of (right now we can reason about expressions if we know some information about its own parts aka operands)?

In D102696#2767894, @martong wrote:

This happens for the same reason that your patch is needed in the first place: we're eagerly substituting a constant.

Absolutely, that's the point. On the other hand, it is very important to emphasize that we cannot solve this problem with a stronger solver, see my example with 3 variables and two equations above.

Well, in that other example, we should be asking about $x + $y + 0 == 0 at some point. And then the solver should be able to deduce it from $x + $y + $z == 0 and $z == 0. If we're not asking this question, that's obviously the first problem to get out of our way.

Add test case for commutativity
Add test case for Valeriy's case

In D102696#2768857, @vsavchenko wrote:
My take on this: for whatever approach, we definitely will be able to construct a more nested/complex example so that it doesn't work.
For this patch, I'm wondering about something like this:
int foo() {
  if (z != 0)
    return 0;
  if (x + y + z != 0)
    return 0;
  clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}} OK.
  if (y != 0)
    return 0;
  clang_analyzer_eval(y == 0);         // expected-warning{{TRUE}} OK.
  clang_analyzer_eval(z == 0);         // expected-warning{{TRUE}} OK.
  clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}} ?
}

I've added a test case for this, and it passes with the patch.

Also, I added another test case for checking that "y + x" should be zero if "x + y" is constrained. This fails. To support that we should be synthesizing all combination of the commutative expressions, which would not scale of course. (Perhaps we could do this only for the upper level expression and this way, the most common case could be supported)

In D102696#2769465, @NoQ wrote:

In D102696#2767894, @martong wrote:

This happens for the same reason that your patch is needed in the first place: we're eagerly substituting a constant.

Absolutely, that's the point. On the other hand, it is very important to emphasize that we cannot solve this problem with a stronger solver, see my example with 3 variables and two equations above.

Well, in that other example, we should be asking about $x + $y + 0 == 0 at some point. And then the solver should be able to deduce it from $x + $y + $z == 0 and $z == 0. If we're not asking this question, that's obviously the first problem to get out of our way.

Yeah, you are right and my analysis was not precise enough. I confirm that currently - at main - we do query $x + $y + 0 == 0. And the solver is incapable of solving it from the given constraints. So, yes, this patch is a "workaround" to ask the solver what does it know from the value of the verbatim expression. And this has it's limitation (see the test case for commutativity), still I think it could make the analyzer more precise.

As for the solver, it is something that tormented me for a long time. Is there a way to avoid a full-blown brute force check of all existing constraints and get some knowledge about symbolic expressions by constraints these symbolic expressions are actually part of (right now we can reason about expressions if we know some information about its own parts aka operands)?

Well, it is a hard question.
I've been thinking about building a "parent" map for the sub-expressions, like we do in the AST (see clang::ParentMap). We could use this parent map to inject new constraints during the "constant folding" mechanism.
So, let's say we have $x + $y = 0 and then when we process $y = 0 then we'd add a new constraint: $x = 0. We could add this new constraint by knowing that we have to visit $x + $y because $y is connected to that in the parent map.
What do you think, could it work?

In D102696#2773304, @martong wrote:

As for the solver, it is something that tormented me for a long time. Is there a way to avoid a full-blown brute force check of all existing constraints and get some knowledge about symbolic expressions by constraints these symbolic expressions are actually part of (right now we can reason about expressions if we know some information about its own parts aka operands)?

Well, it is a hard question.
I've been thinking about building a "parent" map for the sub-expressions, like we do in the AST (see clang::ParentMap). We could use this parent map to inject new constraints during the "constant folding" mechanism.
So, let's say we have $x + $y = 0 and then when we process $y = 0 then we'd add a new constraint: $x = 0. We could add this new constraint by knowing that we have to visit $x + $y because $y is connected to that in the parent map.
What do you think, could it work?

Yes, this was exactly my line of thinking. Instead of trying to use hard logic every time we are asked a question, we can try to simplify existing constraints based on the new bit of information.
The main problem with this is the tree nature of symbolic expressions. If we have something trivial like $x + $y - sure. But something with lot more nested subexpressions can get tricky really fast. That can be solved if we have a canonical form for the trees (this will resolve the problem with $x + $y versus $y + $x as well). Right now, we have bits of this all around our codebase, but those are more like workarounds as opposed to one general approach.

Harbormaster completed remote builds in B105587: Diff 346966.May 21 2021, 3:45 AM

In D102696#2773340, @vsavchenko wrote:

In D102696#2773304, @martong wrote:

As for the solver, it is something that tormented me for a long time. Is there a way to avoid a full-blown brute force check of all existing constraints and get some knowledge about symbolic expressions by constraints these symbolic expressions are actually part of (right now we can reason about expressions if we know some information about its own parts aka operands)?

Well, it is a hard question.
I've been thinking about building a "parent" map for the sub-expressions, like we do in the AST (see clang::ParentMap). We could use this parent map to inject new constraints during the "constant folding" mechanism.
So, let's say we have $x + $y = 0 and then when we process $y = 0 then we'd add a new constraint: $x = 0. We could add this new constraint by knowing that we have to visit $x + $y because $y is connected to that in the parent map.
What do you think, could it work?

Yes, this was exactly my line of thinking. Instead of trying to use hard logic every time we are asked a question, we can try to simplify existing constraints based on the new bit of information.
The main problem with this is the tree nature of symbolic expressions. If we have something trivial like $x + $y - sure. But something with lot more nested subexpressions can get tricky really fast. That can be solved if we have a canonical form for the trees (this will resolve the problem with $x + $y versus $y + $x as well). Right now, we have bits of this all around our codebase, but those are more like workarounds as opposed to one general approach.

OK, I have started working on a constraint manager based prototype implementation with a parent map and a hook in setConstraint. The test file in this patch (even the commutativity test) passes with that, so, it looks promising. I hope I can share that next week. Also, I'd like do some performance measurements as well, both on this patch and for the new solver based one.

I think It would be also beneficial to document the semantics. Whenever we say $x + $y it's not entirely clear whether we talk about the addition operation in a mathematical sense or we follow C/C++ language semantics. Right now it's mostly mixed within the analyzer. It would be really nice to see for example how and when wrapping is considered.
What if the types of the operands don't match in bitwidth or signess. This is sort of the reason why ArrayBoundV2 has false positives in some scenarios.

In D102696#2774169, @martong wrote:

In D102696#2773340, @vsavchenko wrote:

In D102696#2773304, @martong wrote:

As for the solver, it is something that tormented me for a long time. Is there a way to avoid a full-blown brute force check of all existing constraints and get some knowledge about symbolic expressions by constraints these symbolic expressions are actually part of (right now we can reason about expressions if we know some information about its own parts aka operands)?

Well, it is a hard question.
I've been thinking about building a "parent" map for the sub-expressions, like we do in the AST (see clang::ParentMap). We could use this parent map to inject new constraints during the "constant folding" mechanism.
So, let's say we have $x + $y = 0 and then when we process $y = 0 then we'd add a new constraint: $x = 0. We could add this new constraint by knowing that we have to visit $x + $y because $y is connected to that in the parent map.
What do you think, could it work?

Yes, this was exactly my line of thinking. Instead of trying to use hard logic every time we are asked a question, we can try to simplify existing constraints based on the new bit of information.
The main problem with this is the tree nature of symbolic expressions. If we have something trivial like $x + $y - sure. But something with lot more nested subexpressions can get tricky really fast. That can be solved if we have a canonical form for the trees (this will resolve the problem with $x + $y versus $y + $x as well). Right now, we have bits of this all around our codebase, but those are more like workarounds as opposed to one general approach.

OK, I have started working on a constraint manager based prototype implementation with a parent map and a hook in setConstraint. The test file in this patch (even the commutativity test) passes with that, so, it looks promising. I hope I can share that next week. Also, I'd like do some performance measurements as well, both on this patch and for the new solver based one.

Parent map is more or less trivial for situations like $x + $y, where we simply relate $x and $y. But what should we do for a more general case when $x and $y are expressions themselves. One simple example here is $a +$b +$c as originally constrained and $a + $c that got constrained later on. Should we include $a, $b, $c`, $a + $b, $a + $c, and $b + $c to the parent map to make it work?

And again even with parent map, we still need canonicalization.

Alright, I see your point. I agree that solving the problem of "$a +$b +$c
constrained and then $a + $c got constrained" requires canonicalization.
However, canonicalization seems to be not trivial to implement. And there are
some other easier cases that I think we could (should) solve before starting to
implement canonicalization.

Parent map is just an optimization. The functionality should be working even if
we apply brute-force to go through all existing constraints.

Because of the above, I suggest the following steps. These steps could be
implemented as individual patches that depend on each other and once all of
them accepted then we could merge them in one lock-step.

Update setConstraint to simplify existing constraints (and adding the

simplified constraint) when a new constraint is added. In this step we'd just
simply iterate over all existing constraints and would try to simplfy them with
simplifySVal. This would solve the simplest cases where we have two symbols
in the tree. E.g. these cases would pass:

int test_rhs_further_constrained(int x, int y) {
  if (x + y != 0)
    return 0;
  if (y != 0)
    return 0;
  clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
  clang_analyzer_eval(y == 0);     // expected-warning{{TRUE}}
  return 0;
}

int test_lhs_and_rhs_further_constrained(int x, int y) {
  if (x % y != 1)
    return 0;
  if (x != 1)
    return 0;
  if (y != 2)
    return 0;
  clang_analyzer_eval(x % y == 1); // expected-warning{{TRUE}}
  clang_analyzer_eval(y == 2);     // expected-warning{{TRUE}}
  return 0;
}

Add the capability to simplify more complex constraints, where there are 3

symbols in the tree. In this change I suggest the extension and overhaul of
simplifySVal. This change would make the following cases to pass (notice the
3rd check in each test-case).

int test_left_tree_constrained(int x, int y, int z) {
  if (x + y + z != 0)
    return 0;
  if (x + y != 0)
    return 0;
  clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}}
  clang_analyzer_eval(x + y == 0);     // expected-warning{{TRUE}}
  clang_analyzer_eval(z == 0);         // expected-warning{{TRUE}}
  x = y = z = 1;
  return 0;
}

int test_right_tree_constrained(int x, int y, int z) {
  if (x + (y + z) != 0)
    return 0;
  if (y + z != 0)
    return 0;
  clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}}
  clang_analyzer_eval(y + z == 0);     // expected-warning{{TRUE}}
  clang_analyzer_eval(x == 0);         // expected-warning{{TRUE}}
  return 0;
}

Add canonical trees to the solver. The way we should do this is to build

"flat" sub-trees of associative operands. E.g. b + a + c becomes +(a, b, c),
i.e. a tree with a root and 3 children [1]. The ordering of the children
could be done by their addresses (Stmt *) and we could simply store them in an
array. Probably we'll need a new mapping: SymbolRef -> CanonicalTree.

When a new constraint is added then we build up its canonical tree (if not
existing yet) Then, we check each existing constraints' canonical tree to check
whether the newly constrained expression is a sub-expression of that. (The
check could be a simple recursively descending search in the children arrays of
the nodes.) If it is a sub-expression then we simplify that with a customized
canonical tree visitor that will call evalBinOp appropriately.

In this step we should handle only + and *

This would pass the below tests:

int test_left_tree_constrained_commutative(int x, int y, int z) {
  if (x + y + z != 0)
    return 0;
  if (y + x != 0)
    return 0;
  clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}}
  clang_analyzer_eval(x + y == 0);     // expected-warning{{TRUE}}
  clang_analyzer_eval(z == 0);         // expected-warning{{TRUE}}
  return 0;
}

int test_associative(int x, int y, int z) {
  if (x + y + z != 0)
    return 0;
  if (x + z != 0)
    return 0;
  clang_analyzer_printState();
  clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}}
  clang_analyzer_eval(x + z == 0);     // expected-warning{{TRUE}}
  clang_analyzer_eval(y == 0);         // expected-warning{{TRUE}}
  return 0;

Extend 3) with - and /. lhs - rhs becomes a lhs + (-rhs) and lhs /rhs

is substituted wit lhs * (1/rhs).

There are still some expressions that the solver will not be able to reason about

(e.g. <<), but it may happen that we store a constraint for them. In these
cases it is a good idea to search for the expression in the constraint map, but
without constant folding. This functionality might be solved by step 2). If
not, then https://reviews.llvm.org/D102696 could be a good alternative.

Do performance measurements and enhance perhaps with parent maps.

I am planning to upload a patch for steps 1) and 2) very soon.

[1] Hans Vangheluwe et al, An algorithm to implement a canonical representation
of algebraic expressions and equations in AToM3

martong mentioned this in D103314: [Analyzer][solver] Simplify existing constraints when a new constraint is added.May 28 2021, 6:26 AM

martong mentioned this in D103317: [Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the tree.May 28 2021, 6:40 AM

Abandoning in favor of
https://reviews.llvm.org/D103314
https://reviews.llvm.org/D103317

Oh, wow. Great plan! I'd like to participate. 😊
I have a few comments:

Alright, I see your point. I agree that solving the problem of "$a +$b +$c
constrained and then $a + $c got constrained" requires canonicalization.

That was actually not an example for canonicalization, but rather that ParentMap will exponentially explode due to the number of combinations of sub-expressions if we want to make it useful for all cases.
If we do want to do something with situations like this ParentMap should be re-thought.

Update setConstraint to simplify existing constraints (and adding the

simplified constraint) when a new constraint is added. In this step we'd just
simply iterate over all existing constraints and would try to simplfy them with
simplifySVal

I like this one! So, to be more specific: If we add a constraint Sym -> C, where C is a constant, we can try to simplify other constrained symbols, where Sym is a sub-expression.
I want to add that now (because of me 😅) we do not store constraints as a map Sym -> RangeSet, it's a bit trickier.
We have a few mappings: Sym -> EquivalenceClass (member-to-class relationship), EquivalenceClass -> RangeSet, EquivalenceClass -> {Sym} (class-to-members relationship), and EquivalenceClass -> {EquivalenceClass} (to track disequalities).
It means that we will need to replace Sym with its simplified version in Sym -> EquivalenceClass and EquivalenceClass -> {Sym}.
Not that it is impossible or something, but it makes it a bit harder as it has more moving pieces.

Add the capability to simplify more complex constraints, where there are 3

symbols in the tree. In this change I suggest the extension and overhaul of
simplifySVal. This change would make the following cases to pass (notice the
3rd check in each test-case).

This sounds great, but we we do need to think about performance, so we should discuss how do we plan to implement such an extension.

Add canonical trees to the solver. The way we should do this is to build

"flat" sub-trees of associative operands. E.g. b + a + c becomes +(a, b, c),
i.e. a tree with a root and 3 children [1]. The ordering of the children
could be done by their addresses (Stmt *) and we could simply store them in an
array. Probably we'll need a new mapping: SymbolRef -> CanonicalTree.

I think that flattening is a good idea (and it is widespread in other solvers), but "flat" sub-tree (aka non-binary tree).
However, it is useful ONLY if we can efficiently support isSubsetOf predicate (specifically) for cases like +(a, b, c) and +(a, c).

I don't think that Sym -> Canonical is a good idea. The main goal of canonicalization is to compact potentially exponential number of equivalent ways of spelling the same symbolic expression and have a single representation for it.
So, it means that we expect that Canonicalization(Sym) will be called for multiple Syms producing identical results. This way Sym -> Canonical is simply a cache for Canonicalization (i.e. maybe shouldn't even be associated with a state).

Then, we check each existing constraints' canonical tree to check
whether the newly constrained expression is a sub-expression of that.

I'm worried about performance implications of this.

Extend 3) with - and /. lhs - rhs becomes a lhs + (-rhs) and lhs /rhs

is substituted wit lhs * (1/rhs).

Is it true for bit-vectors though?

I just noticed that I simply forgot to click submit yesterday 😩

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Core/

ExprEngineC.cpp

58 lines

test/

Analysis/

find-binop-constraints.cpp

90 lines

Diff 346189

clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp

//=-- ExprEngineC.cpp - ExprEngine support for C expressions ----- C++ --===//		//=-- ExprEngineC.cpp - ExprEngine support for C expressions ----- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines ExprEngine's support for C expressions.		// This file defines ExprEngine's support for C expressions.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/AST/ExprCXX.h"
#include "clang/AST/DeclCXX.h"		#include "clang/AST/DeclCXX.h"
		#include "clang/AST/ExprCXX.h"
		#include "clang/AST/StmtVisitor.h"
#include "clang/StaticAnalyzer/Core/CheckerManager.h"		#include "clang/StaticAnalyzer/Core/CheckerManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h"

using namespace clang;		using namespace clang;
using namespace ento;		using namespace ento;
using llvm::APSInt;		using llvm::APSInt;

/// Optionally conjure and return a symbol for offset when processing		/// Optionally conjure and return a symbol for offset when processing
Show All 9 Lines	static SVal conjureOffsetSymbolOnLocation(
if (Other.getAs<Loc>() &&		if (Other.getAs<Loc>() &&
Ty->isIntegralOrEnumerationType() &&		Ty->isIntegralOrEnumerationType() &&
Symbol.isUnknown()) {		Symbol.isUnknown()) {
return svalBuilder.conjureSymbolVal(Expression, LCtx, Ty, Count);		return svalBuilder.conjureSymbolVal(Expression, LCtx, Ty, Count);
}		}
return Symbol;		return Symbol;
}		}

		// This class is responsible to build such an SVal for a BinOp expression where
		// all constituent SVals are representing an LValue expression. We use it to
		// query constraints that are attached directly to the BinOp.
		class LValueSValBuilder : public ConstStmtVisitor<LValueSValBuilder, SVal> {
		QualType T;
		const LocationContext *LCtx;
		ProgramStateRef State;
		SValBuilder &SvalBuilder;

		public:
		LValueSValBuilder(QualType T, const LocationContext *LCtx,
		ProgramStateRef State, SValBuilder &SvalBuilder)
		: T(T), LCtx(LCtx), State(State), SvalBuilder(SvalBuilder) {}

		SVal VisitImplicitCastExpr(const ImplicitCastExpr *E) {
		if (E->getCastKind() == CK_LValueToRValue) {
		SVal LValue = State->getSVal(E->getSubExpr(), LCtx); // LValue.
		if (Optional<Loc> LValueLoc = LValue.getAs<Loc>())
		return State->getRawSVal(*LValueLoc, T);
		}
		return SVal();
		}
		SVal VisitBinaryOperator(const BinaryOperator *B) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: function 'VisitBinaryOperator' is within a recursive call chain [misc-no-recursion] not useful Lint: Pre-merge checks: clang-tidy: warning: function 'VisitBinaryOperator' is within a recursive call chain [misc-no…
		SVal L = Visit(B->getLHS());
		SVal R = Visit(B->getRHS());
		if (L.isUndef() \|\| R.isUndef())
		return SVal();
		Optional<NonLoc> LNL = L.getAs<NonLoc>();
		Optional<NonLoc> RNL = R.getAs<NonLoc>();
		if (!LNL \|\| !RNL)
		return SVal();
		SVal SymSym =
		SvalBuilder.makeSymExprValNN(B->getOpcode(), LNL, RNL, B->getType());
		return SymSym;
		}
		SVal VisitExpr(const Expr *E) { return SVal(); }
		};

void ExprEngine::VisitBinaryOperator(const BinaryOperator* B,		void ExprEngine::VisitBinaryOperator(const BinaryOperator* B,
ExplodedNode *Pred,		ExplodedNode *Pred,
ExplodedNodeSet &Dst) {		ExplodedNodeSet &Dst) {

Expr *LHS = B->getLHS()->IgnoreParens();		Expr *LHS = B->getLHS()->IgnoreParens();
Expr *RHS = B->getRHS()->IgnoreParens();		Expr *RHS = B->getRHS()->IgnoreParens();

// FIXME: Prechecks eventually go in ::Visit().		// FIXME: Prechecks eventually go in ::Visit().
ExplodedNodeSet CheckedSet;		ExplodedNodeSet CheckedSet;
ExplodedNodeSet Tmp2;		ExplodedNodeSet Tmp2;
getCheckerManager().runCheckersForPreStmt(CheckedSet, Pred, B, *this);		getCheckerManager().runCheckersForPreStmt(CheckedSet, Pred, B, *this);

Show All 38 Lines	if (!B->isAssignmentOp()) {
}		}

// Although we don't yet model pointers-to-members, we do need to make		// Although we don't yet model pointers-to-members, we do need to make
// sure that the members of temporaries have a valid 'this' pointer for		// sure that the members of temporaries have a valid 'this' pointer for
// other checks.		// other checks.
if (B->getOpcode() == BO_PtrMemD)		if (B->getOpcode() == BO_PtrMemD)
state = createTemporaryRegionIfNeeded(state, LCtx, LHS);		state = createTemporaryRegionIfNeeded(state, LCtx, LHS);

// Process non-assignments except commas or short-circuited		SVal Result;
// logical expressions (LAnd and LOr).		// Query any constraint that may be directly attached to the BinOp.
SVal Result = evalBinOp(state, Op, LeftV, RightV, B->getType());		SVal LValueSymExpr =
		LValueSValBuilder(B->getType(), LCtx, state, svalBuilder).Visit(B);
		const llvm::APSInt *KnownInt =
		state->getStateManager().getConstraintManager().getSymVal(
		state, LValueSymExpr.getAsSymbol());
		if (KnownInt)
		Result = svalBuilder.makeIntVal(*KnownInt);
		// Fall-back to the constant folding mechanism.
		if (Result.isUndef())
		Result = evalBinOp(state, Op, LeftV, RightV, B->getType());

if (!Result.isUnknown()) {		if (!Result.isUnknown()) {
state = state->BindExpr(B, LCtx, Result);		state = state->BindExpr(B, LCtx, Result);
} else {		} else {
// If we cannot evaluate the operation escape the operands.		// If we cannot evaluate the operation escape the operands.
state = escapeValues(state, LeftV, PSK_EscapeOther);		state = escapeValues(state, LeftV, PSK_EscapeOther);
state = escapeValues(state, RightV, PSK_EscapeOther);		state = escapeValues(state, RightV, PSK_EscapeOther);
}		}

▲ Show 20 Lines • Show All 1,072 Lines • Show Last 20 Lines

clang/test/Analysis/find-binop-constraints.cpp

This file was added.

				// RUN: %clang_analyze_cc1 %s \
				// RUN: -analyzer-checker=core \
				// RUN: -analyzer-checker=debug.ExprInspection \
				// RUN: -verify

				void clang_analyzer_eval(bool);

				int test_legacy_behavior(int x, int y) {
				if (y != 0)
				return 0;
				if (x + y != 0)
				return 0;
				clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(y == 0); // expected-warning{{TRUE}}
				return y / (x + y); // expected-warning{{Division by zero}}
				}

				int test_rhs_further_constrained(int x, int y) {
				if (x + y != 0)
				return 0;
				if (y != 0)
				return 0;
				clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(y == 0); // expected-warning{{TRUE}}
				return 0;
				}

				int test_lhs_further_constrained(int x, int y) {
				if (x + y != 0)
				return 0;
				if (x != 0)
				return 0;
				clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(x == 0); // expected-warning{{TRUE}}
				return 0;
				}

				int test_lhs_and_rhs_further_constrained(int x, int y) {
				if (x % y != 1)
				return 0;
				if (x != 1)
				return 0;
				if (y != 1)
				return 0;
				clang_analyzer_eval(x % y == 1); // expected-warning{{TRUE}}
				clang_analyzer_eval(y == 1); // expected-warning{{TRUE}}
				return 0;
				}

				int test_binop_when_height_is_2_r(int a, int x, int y, int z) {
				switch (a) {
				case 1: {
				if (x + y + z != 0)
				return 0;
				if (z != 0)
				return 0;
				clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(z == 0); // expected-warning{{TRUE}}
				break;
				}
				case 2: {
				if (x + y + z != 0)
				return 0;
				if (y != 0)
				return 0;
				clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(y == 0); // expected-warning{{TRUE}}
				break;
				}
				case 3: {
				if (x + y + z != 0)
				return 0;
				if (x != 0)
				return 0;
				clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(x == 0); // expected-warning{{TRUE}}
				break;
				}
				case 4: {
				if (x + y + z != 0)
				return 0;
				if (x + y != 0)
				return 0;
				clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
				break;
				}
				}
				return 0;
				}