This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
15/15
SimpleSValBuilder.cpp
-
test/Analysis/
-
Analysis/
12/12
svalbuilder-simplify-compound-svals.cpp
-
taint-tester.c

Differential D103317

[Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the tree
ClosedPublic

Authored by martong on May 28 2021, 6:40 AM.

Download Raw Diff

Details

Reviewers

vsavchenko
NoQ
steakhal
Szelethus

Commits

rG0a17896fe6fd: [Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in…

Summary

Add the capability to simplify more complex constraints where there are 3
symbols in the tree. In this change I extend simplifySVal to query constraints
of children sub-symbols in a symbol tree. (The constraint for the parent is
asked in getKnownValue.)

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

martong created this revision.May 28 2021, 6:40 AM

Herald added a reviewer: Szelethus. · View Herald TranscriptMay 28 2021, 6:40 AM

Herald added subscribers: manas, ASDenysPetrov, gamesh411 and 10 others. · View Herald Transcript

martong requested review of this revision.May 28 2021, 6:40 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2021, 6:40 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

martong added a parent revision: D103314: [Analyzer][solver] Simplify existing constraints when a new constraint is added.May 28 2021, 6:40 AM

martong mentioned this in D102696: [Analyzer] Find constraints that are directly attached to a BinOp.May 28 2021, 6:43 AM

Looking great, thanks!
I have a couple of notes below.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1150–1151	Do we actually produce them?
1174–1175	It seems like a very common pattern now: SVal Const = getConst(Expr); if (Const.isUndef()) return Visit(Expr); return Const; At least we can make a function out of this, so we don't repeat this pattern over and over. But I was thinking more radically to actually replace `Visit` with this, what do you think?
clang/test/Analysis/simplify-complex-constraints.cpp
25–27 ↗	(On Diff #348511)	Since it's not really specific to addition, maybe we can have tests for other operators?

Harbormaster completed remote builds in B106702: Diff 348511.May 28 2021, 7:19 AM

ASDenysPetrov added inline comments.Jun 2 2021, 3:47 AM

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1150–1151	You can met SymbolCast here if `evalCast` produced it. See `SValBuilder::evalCastSubKind(nonloc::SymbolVal...)`: if (!Loc::isLocType(CastTy)) if (!IsUnknownOriginalType \|\| !CastTy->isFloatingType() \|\| T->isFloatingType()) return makeNonLoc(SE, T, CastTy); Also my patch will produce a lot of integral cast symbols D103096. You are welcome to examine it.

vsavchenko added inline comments.Jun 2 2021, 4:03 AM

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1150–1151	In my comment I meant `IntSymExpr`

ASDenysPetrov added inline comments.Jun 2 2021, 4:38 AM

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1150–1151	Oh, anyway :-)

ASDenysPetrov added inline comments.Jun 2 2021, 5:01 AM

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1150–1151	I think we should remove ConcreteInt and and as a consequence IntSymExpr, SymIntExpr in the nearest future. SymbolVal should be reworked to store a QualType, a RangeSet and an expression pointer itself. IMO SymSymExpr and SymbolVal is sufficient to make any calculations. It may help generalize a lot of them. ConcreteInt is just a special case for SymbolVal. More than that PointerToMember can be removed as well, because we would get an info being a pointer to member from QualType in this way.

Returning to the discussion raised in D102696, I'd like to share my vision.
I think we can use much easier approach to use valid constraints at any point of time.
The main idea is lazy-reasoning of the ranges.
This approach:

doesn't need to use canonicalization;
doesn't need to update all the constraints on each setConstraint call (using brute-force or any other structure traversals);
add an additional bindings for every simple operand on each assignment or initialization.
remove an invalid bindings for every simple operand on each assignment or initialization.
recursively get the range when and only when needed (lazy-reasoning).

Example:

a = b + 1;
c = a - 5;
if (b != 42) return 0;
if(c == 38) return 1;
return 0;

a = b + 1;

LHS a lookup. LHS not found.
Add binding $a = $b + 1
Traverse through RHS b + 1.
b lookup. b not found.
Built a new expression a - 1 for b.
Add binding for b: $b = $a - 1
1 is a constant. Skip.

c = a - 5;

LHS c lookup. LHS not found.
Add binding $c = $a - 5
Traverse through RHS a - 5.
a lookup. a found.
If LHS not found and a found, then skip.
5 is a constant. Skip.

if (b != 42) return;

Get the range of $b. Direct range not found. Add $b to the list of visited symbols.
Substitute bindings. From $b != 42 to $a - 1 == 42.
Traverse through $a - 1.
Get the range of $a. Direct range not found. Add $a to the list of visited symbols.
1 is a constant. Skip.
Substitute bindings. From $a - 1 == 42 to $b + 1 - 1 == 42.
Traverse through $b + 1 - 1.
$b is in the list of already visited symbols. No way to find a range.
Create a range from the type [MIN, MAX].
Perform comparison and produce a constraint for b [42, 42].

if(c == 38) return;

Get the range of $c. Direct range not found. Add $c to the list of visited symbols.
Substitute bindings. From $c == 38 to $a - 5 == 38.
Traverse through $a - 5.
Get the range of $a. Direct range not found. Add $a to the list of visited symbols.
5 is a constant. Skip.
Substitute bindings. From $a - 5 == 38 to $b + 1 - 5 == 38.
Traverse through $b + 1 - 5.
Get the range of $b. Direct range found [42, 42].
1 is a constant. Skip.
Calculate(simplificate the expression etc.) a new range for $c [38, 38].
Condition is true.

return 1

P.S. I've just started thinking of the way to solve this problem. For now it's just a sketch of where I'm going. I think this can even replace EquivalenceClass.

In D103317#2793557, @ASDenysPetrov wrote:
Returning to the discussion raised in D102696, I'd like to share my vision.
I think we can use much easier approach to use valid constraints at any point of time.
The main idea is lazy-reasoning of the ranges.
This approach:

doesn't need to use canonicalization;

doesn't need to update all the constraints on each setConstraint call (using brute-force or any other structure traversals);

add an additional bindings for every simple operand on each assignment or initialization.

remove an invalid bindings for every simple operand on each assignment or initialization.

recursively get the range when and only when needed (lazy-reasoning).

Example:
a = b + 1;
c = a - 5;
if (b != 42) return 0;
if(c == 38) return 1;
return 0;
a = b + 1;

LHS a lookup. LHS not found.

Add binding $a = $b + 1

Traverse through RHS b + 1.

b lookup. b not found.

Built a new expression a - 1 for b.

Add binding for b: $b = $a - 1

1 is a constant. Skip.

c = a - 5;

LHS c lookup. LHS not found.

Add binding $c = $a - 5

Traverse through RHS a - 5.

a lookup. a found.

If LHS not found and a found, then skip.

5 is a constant. Skip.

if (b != 42) return;

Get the range of $b. Direct range not found. Add $b to the list of visited symbols.

Substitute bindings. From $b != 42 to $a - 1 == 42.

Traverse through $a - 1.

Get the range of $a. Direct range not found. Add $a to the list of visited symbols.

1 is a constant. Skip.

Substitute bindings. From $a - 1 == 42 to $b + 1 - 1 == 42.

Traverse through $b + 1 - 1.

$b is in the list of already visited symbols. No way to find a range.

Create a range from the type [MIN, MAX].

Perform comparison and produce a constraint for b [42, 42].

if(c == 38) return;

Get the range of $c. Direct range not found. Add $c to the list of visited symbols.

Substitute bindings. From $c == 38 to $a - 5 == 38.

Traverse through $a - 5.

Get the range of $a. Direct range not found. Add $a to the list of visited symbols.

5 is a constant. Skip.

Substitute bindings. From $a - 5 == 38 to $b + 1 - 5 == 38.

Traverse through $b + 1 - 5.

Get the range of $b. Direct range found [42, 42].

1 is a constant. Skip.

Calculate(simplificate the expression etc.) a new range for $c [38, 38].

Condition is true.

return 1

P.S. I've just started thinking of the way to solve this problem. For now it's just a sketch of where I'm going.

Great to know that you are also thinking about this problem!
The more the merrier!

I have a couple of notes about your suggestion.
First is minor, a = b + 1 and c = a + 5 won't actually produce any constraints. Starting from those statements if you request a symbol for a you will get b + 1 and b + 6 for c.
Of course, you can change your code so that we make an assumption that a == b + 1 and c == a + 5 and those would be constraints. That's why I said that it's minor.

Now, for the biggest concern. You cover in a very detailed manner all the traversals and caches for visited symbols, but have Substitute bindings as a sub-step , which is HUGE and absolutely non-trivial.
In your examples, you have a linear chain of related symbolic expressions, i.e. something like this x -> y + 2, y -> z + 3, z -> 4 so that we have x -> 9 after three substitutions. But the problem it is generally not one-to-one relationship, so x -> y1 + 1, x -> y2 + 2, ... , x -> yN + N. Imagine that each yK is involved with another M constraints. So, you would need to check x -> y1 -> z11, if it's not working - check x -> y1 -> z12 and so on. When you run out of options for z1K, you need to backtrack and try x -> y2 -> z21, etc. As you can see, this is BRUTE FORCE.

And it couldn't be easier theoretically. So, instead of trying that many options every time we need to judge about, we take a bit simpler shortcut. It's far from being complete, but honestly we're not pursuing completeness. For my example above, whenever we find out that zKM -> 5, we simplify everything that involves zKM and get yK -> 10 and get x -> 15. It is far from being perfect, I don't like that we need to go through all symbols and try to simplify every single symbol, and it would be good to optimize this scenario (mapping symbols to all symbolic expressions that contain them, for example), but essentially it is cheaper in the long run.

In D103317#2793658, @vsavchenko wrote:

But the problem it is generally not one-to-one relationship, so x -> y1 + 1, x -> y2 + 2, ... , x -> yN + N.

In my approach it can't be more then one binding for one symbol. Like:
x = y + z; produces $x = $y + $z, $y = $x - $z, $z = $x - $y.
Then if z = a;, all above should be invalidated (removed), because we can't rely on those expressions any more. And new ones should be added $z = $a and $a = $z.

As you can see, this is BRUTE FORCE.

I would say this is a linked list with stop-markers, not blindly passing through all the bindings. And I don't talk about implementation details. For now we just need to elaborate an approach.

In D103317#2793781, @ASDenysPetrov wrote:

In D103317#2793658, @vsavchenko wrote:

But the problem it is generally not one-to-one relationship, so x -> y1 + 1, x -> y2 + 2, ... , x -> yN + N.

In my approach it can't be more then one binding for one symbol. Like:
x = y + z; produces $x = $y + $z, $y = $x - $z, $z = $x - $y.

Hmm, Okay, but what about situations if you have: a = a1 + a2 and a = a3 + a4 + a5 are you going to throw away one of these constraints? And if so, how do you want to select which one?

In D103317#2793797, @vsavchenko wrote:

Hmm, Okay, but what about situations if you have: a = a1 + a2 and a = a3 + a4 + a5 are you going to throw away one of these constraints? And if so, how do you want to select which one?

Are you talking about comparison or assignment? Both assignments can't be valid at the same time, and latter replaces bindings of former. In case of comparisons, they both can be valid.
But we should keep in mind that assignment is a write operation which replaces and invalidates previous bindings, and comparison is a read operation. It can add new bindings but can not remove old ones.
This is what I haven't dig deep enough yet. Let's do this together how we can handle that.

In D103317#2793911, @ASDenysPetrov wrote:

In D103317#2793797, @vsavchenko wrote:

Hmm, Okay, but what about situations if you have: a = a1 + a2 and a = a3 + a4 + a5 are you going to throw away one of these constraints? And if so, how do you want to select which one?

Are you talking about comparison or assignment? Both assignments can't be valid at the same time, and latter replaces bindings of former. In case of comparisons, they both can be valid.
But we should keep in mind that assignment is a write operation which replaces and invalidates previous bindings, and comparison is a read operation. It can add new bindings but can not remove old ones.
This is what I haven't dig deep enough yet. Let's do this together how we can handle that.

I replied to you earlier that assignments are not producing constraints. The analyzer has some sort of SSA (not really, but anyways), so every time we reassign the variable it actually becomes a different symbol. So, from this perspective DeclRefExpr x and symbol x are two different things:

int x = foo(); // after this DeclRefExpr and VarDecl x are associated with conjured symbol #1 (aka conj#1)
int c = x + 10; // c is associated with `$conj#1 + 10`
x = a + 1; // DeclRefExpr x is now associated with `$a + 1`
x = c + x; // and now it is `conj#1 + 11`

There is no symbolic assignment in the static analyzer. Symbols are values, values don't change. We can only obtain more information of what particular symbolic value represents.

I hope it makes it more clear.

As for equality, we already track equivalence classes of symbols and they can have more than 2 symbols in them. Actually, this whole data structure mostly makes sense for cases when the class has more than 2 elements in it.

! In D103317#2793797, @vsavchenko wrote:

I replied to you earlier that assignments are not producing constraints. The analyzer has some sort of SSA (not really, but anyways), so every time we reassign the variable it actually becomes a different symbol. So, from this perspective DeclRefExpr x and symbol x are two different things:
int x = foo(); // after this DeclRefExpr and VarDecl x are associated with conjured symbol #1 (aka conj#1)
int c = x + 10; // c is associated with `$conj#1 + 10`
x = a + 1; // DeclRefExpr x is now associated with `$a + 1`
x = c + x; // and now it is `conj#1 + 11`
There is no symbolic assignment in the static analyzer. Symbols are values, values don't change. We can only obtain more information of what particular symbolic value represents.

I hope it makes it more clear.

As for equality, we already track equivalence classes of symbols and they can have more than 2 symbols in them. Actually, this whole data structure mostly makes sense for cases when the class has more than 2 elements in it.

Thanks for the clarifications.

I'm trying to say that we need to keep our attention on assignments/initializations as well if we want to solve this problem efficiently. I didn't say that assignments produce constraints. Of course they do not :-). I want we produce more bindings(associations) on assignments to use them as mapping for further substitutions and traversing to get the right range.

As for the equivalence classes. I think we should go away from special cases. I mean what about lower-then classes or greater-or-equal classes. I think you got what I mean. I'm trying to find a general approach which can cover almost all individual improvements around CSA codebase(about ranges).

I started working on this when saw your discussion. Just what you also consider the direction I'm moving. A lot of corner-examples, like you gave me, are welcome. Thank you for the feedbacks!

In D103317#2794099, @ASDenysPetrov wrote:
! In D103317#2793797, @vsavchenko wrote:
I replied to you earlier that assignments are not producing constraints. The analyzer has some sort of SSA (not really, but anyways), so every time we reassign the variable it actually becomes a different symbol. So, from this perspective DeclRefExpr x and symbol x are two different things:
int x = foo(); // after this DeclRefExpr and VarDecl x are associated with conjured symbol #1 (aka conj#1)
int c = x + 10; // c is associated with `$conj#1 + 10`
x = a + 1; // DeclRefExpr x is now associated with `$a + 1`
x = c + x; // and now it is `conj#1 + 11`
There is no symbolic assignment in the static analyzer. Symbols are values, values don't change. We can only obtain more information of what particular symbolic value represents.

I hope it makes it more clear.

As for equality, we already track equivalence classes of symbols and they can have more than 2 symbols in them. Actually, this whole data structure mostly makes sense for cases when the class has more than 2 elements in it.
Thanks for the clarifications.

I'm trying to say that we need to keep our attention on assignments/initializations as well if we want to solve this problem efficiently. I didn't say that assignments produce constraints. Of course they do not :-). I want we produce more bindings(associations) on assignments to use them as mapping for further substitutions and traversing to get the right range.

As for the equivalence classes. I think we should go away from special cases. I mean what about lower-then classes or greater-or-equal classes. I think you got what I mean. I'm trying to find a general approach which can cover almost all individual improvements around CSA codebase(about ranges).

I started working on this when saw your discussion. Just what you also consider the direction I'm moving. A lot of corner-examples, like you gave me, are welcome. Thank you for the feedbacks!

Thank you for your comments Denys! It's great to see that some else is also bothered by these problems, and the more approaches we have the deeper the overall understanding is.

In D103317#2794099, @ASDenysPetrov wrote:
! In D103317#2793797, @vsavchenko wrote:
I replied to you earlier that assignments are not producing constraints. The analyzer has some sort of SSA (not really, but anyways), so every time we reassign the variable it actually becomes a different symbol. So, from this perspective DeclRefExpr x and symbol x are two different things:
int x = foo(); // after this DeclRefExpr and VarDecl x are associated with conjured symbol #1 (aka conj#1)
int c = x + 10; // c is associated with `$conj#1 + 10`
x = a + 1; // DeclRefExpr x is now associated with `$a + 1`
x = c + x; // and now it is `conj#1 + 11`
There is no symbolic assignment in the static analyzer. Symbols are values, values don't change. We can only obtain more information of what particular symbolic value represents.

I hope it makes it more clear.

As for equality, we already track equivalence classes of symbols and they can have more than 2 symbols in them. Actually, this whole data structure mostly makes sense for cases when the class has more than 2 elements in it.
Thanks for the clarifications.

I'm trying to say that we need to keep our attention on assignments/initializations as well if we want to solve this problem efficiently. I didn't say that assignments produce constraints. Of course they do not :-). I want we produce more bindings(associations) on assignments to use them as mapping for further substitutions and traversing to get the right range.

As for the equivalence classes. I think we should go away from special cases. I mean what about lower-then classes or greater-or-equal classes. I think you got what I mean. I'm trying to find a general approach which can cover almost all individual improvements around CSA codebase(about ranges).

I started working on this when saw your discussion. Just what you also consider the direction I'm moving. A lot of corner-examples, like you gave me, are welcome. Thank you for the feedbacks!

Assignments and initializations don't affect anything, we shouldn't care about them at all because they don't affect symbolic values in any way. You keep bringing up assignments/initializations when I repeatedly tell you that these are orthogonal to symbolic values, they simply associate certain symbolic values/expressions with memory regions (aka bind them).

I want to make sure you understand that there is no silver bullet here. There is no one magic method that will resolve everything in polynomial let alone linear time (unless someone proves that NP = P). This problem is inherently exponential. We try to avoid it at all costs (and, thus, suck sometimes).
In general, when SMT-solvers need to solve a set of bit-vector equations, they use a technique called bit blasting - just turning each n-bit integer into n-separate boolean variables, and using SAT solver after that. More advanced solvers use special theories before to reduce dimensionality of the problem, one of them is equality theory. These theories are not limitations or "special cases", they are pieces of fundamental knowledge about the domain. Those are as generic as you can go. One symbolic expression can be equal to multiple symbolic expressions at the same time, it is the fact, not a special case.

I'm sorry for being too harsh here, but this problem IS harder than you think. If you want to approach it correctly, and not waste your time, you should get these fundamentals straight.

@vsavchenko

I appologize for my immaturity and for the audacity of my infantile assumptions. I admit any constructive criticism. Thank you for referencing the theory. I will thoroughly study the domain in all.

As for the assignments and why I brought them up, that's because I just tried to look at the problem from another angle. Consider two snippets:

void f(int a, int b, int c, int x, int y){
  a = b;
  x = y;
  c = a;
  // What can you say about a, x, c?
  // `a` equals to `b` and 'c'
  // `x` equals to `y`
  // `c` equals to `a` and `b`
}

and

void f(int a, int b, int c, int x, int y){
  if(a == b) {
    if(x == y) {
      if(c == a) {
        // What can you say about a, x, c?
        // `a` equals to `b` and 'c'
        // `x` equals to `y`
        // `c` equals to `a` and `b`
      }
    }
  }
}

They both generates the same relationship between the variables. Assignment seemed to me as just a special case in a true branch of == operator and can be substituted with it in calculations. So, I decided to make some assumptions in this way. Digging more, it opened a lot of gaps and lacks for me.
Next time I will investigate deeper to present a finalized solution and not wasting your time explaining me the basics.

In D103317#2803238, @ASDenysPetrov wrote:

@vsavchenko

I appologize for my immaturity and for the audacity of my infantile assumptions. I admit any constructive criticism. Thank you for referencing the theory. I will thoroughly study the domain in all.

No need to apologize! I do like your enthusiasm, great things come from it. But as I said, I don't want you to waste your own time pursuing things that won't work.

As for the assignments and why I brought them up, that's because I just tried to look at the problem from another angle. Consider two snippets:

OK, let me annotate the code, so we can hopefully close this topic once and for all.

void f(int a, int b, int c, int x, int y){

At this point we have 5 memory regions a, b, c, x and y. They represent the memory cells allocated for the corresponding parameters on stack. When we call function, these actually have some values, but we don't know them due to the limitations of the analysis. For this reason, we say that they contain symbolic values. So, let's simply enumerate that so that there is no misunderstanding. Values: #1, #2, #3, #4, and #5. These are not real numbers, we enumerate some unknown values. As I said, we introduced them as initial values for parameters, so our store (mapping from memory to values) looks likes this: a -> #1, b -> #2, c -> #3, x -> #4, y -> #5. We don't know anything about these values at this point.

a = b;

What do we know at this point? Do we get more information about the value that was stored in a before? No, we overwrote this value entirely, and what changed here is the mapping: a -> #2, b -> #2, c -> #3, x -> #4, y -> #5 because b contained value #2. Symbolic value #1 is gone, it's dead. No-one will ever see or use it again.

x = y;

Using the same logic: a -> #2, b -> #2, c -> #3, x -> #5, y -> #5.

c = a;

And again: a -> #2, b -> #2, c -> #2, x -> #5, y -> #5.

  // What can you say about a, x, c?
  // `a` equals to `b` and 'c'
  // `x` equals to `y`
  // `c` equals to `a` and `b`
}

Yes, they are, but we don't need constraints for that. We know which values are associated with each memory region and we can compare them. That's all, this simple. That's why assignments and initializations DON'T MATTER for the solver, these are simply STORE of some symbolic value into memory. Solver cares only about symbolic values, memory is irrelevant.

and

void f(int a, int b, int c, int x, int y){

As before: a -> #1, b -> #2, c -> #3, x -> #4, y -> #5.

if(a == b) {

Now, the value that we imagine to be in memory region a (namely #1) is NOT GONE! It is still there, it is being used. And what we know after this if, is that on one of the paths after it #1 == #2 and on the other path #1 != #2. These are just values that we know nothing about, but we started to gain some knowledge.
So, the store is the same (the values stored in those memory cell are the same as they were before): a -> #1, b -> #2, c -> #3, x -> #4, y -> #5, but now we add constraint #1 == #2.

if(x == y) {

The same here, store is the same, but constraints are now: #1 == #2 and #4 == #5.

if(c == a) {

#1 == #2 == #3 and #4 == #5

        // What can you say about a, x, c?
        // `a` equals to `b` and 'c'
        // `x` equals to `y`
        // `c` equals to `a` and `b`
      }
    }
  }
}

We can actually figure out that #2 == #3 because of equality theory. We only knew #1 == #2 and that #3 == #1, but equality is a transitive binary relation and we reflect that notion in our equivalence classes data-structure.

They both generates the same relationship between the variables. Assignment seemed to me as just a special case in a true branch of == operator and can be substituted with it in calculations. So, I decided to make some assumptions in this way. Digging more, it opened a lot of gaps and lacks for me.

Relationship between variables are the same, but neither solver nor the analyzer reason about things in terms of variables. It's memory and values that the memory carries. In the first example, we have only 2 values stored in 5 memory locations. In the second example, it's 5 values stored in 5 memory locations. On some of the paths we happen to know how those values are related (which is not true for other paths), that's constraints.

I hope now it's clear.

Next time I will investigate deeper to present a finalized solution and not wasting your time explaining me the basics.

We probably should not have discussions like these because they are not directly about the patch on review. If you want to continue this discussion, let's bring it to [cfe-dev].

martong mentioned this in D112296: [Analyzer][solver] Handle adjustments in constraint assignor remainder.Oct 25 2021, 2:44 PM

Rebase on top of parent revisions
Address review comments

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1150–1151	Do we actually produce them? Yes we produce `IntSymExpr`s.
1174–1175	Good idea!
clang/test/Analysis/simplify-complex-constraints.cpp
25–27 ↗	(On Diff #348511)	Ok, I've added a new test, and modified a bit this test.

Harbormaster completed remote builds in B133918: Diff 386796.Nov 12 2021, 4:05 AM

martong added a parent revision: D113753: [Analyzer][Core] Better simplification in SimpleSValBuilder::evalBinOpNN.Nov 12 2021, 4:06 AM

martong added a parent revision: D113754: [Analyzer][Core] Simplify IntSym in SValBuilder.

martong removed a parent revision: D103314: [Analyzer][solver] Simplify existing constraints when a new constraint is added.Nov 12 2021, 4:09 AM

martong retitled this revision from [Analyzer][engine][solver] Simplify complex constraints to [Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the tree.Nov 12 2021, 4:33 AM

martong edited the summary of this revision. (Show Details)

Update in comments: Unknown -> Undef

Harbormaster completed remote builds in B133928: Diff 386806.Nov 12 2021, 5:22 AM

To me at least, the patch looks good.
Please post some comparative measurements to demonstrate it won't introduce runtime regression.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1144	Let's be explicit about it.
1148–1151
clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
24	It feels odd that in some of your examples you return some value, but in the rest, you don't.
33	You could additionally assert that `y == 0` and `z == 0`.
51	What if `z` were in the middle? Would it still pass?
66	Would the test pass if you were using `z + y != 1` here?

Return explicitly with UndefinedVal
Unify test cases (return 0 -> return)

martong added inline comments.Nov 23 2021, 7:26 AM

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1144	Good point.
1148–1151	Makes sense.
clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
24	yep, I agree, updated.
33	`y * z == 0` does not imply that both `y` and `z` are `0`. However, we could expect that one of them is `0`, but currently we don't have such a deduction (this might be done in `ConstraintAssignor` in an independent patch).
51	No, that would not pass. This is because we do not handle commutativity, for that we should have a canonical form of the symbol trees (check out point 3 here https://reviews.llvm.org/D102696#2784624).
66	No, because we do not handle commutativity.

In D103317#3127318, @steakhal wrote:

To me at least, the patch looks good.
Please post some comparative measurements to demonstrate it won't introduce runtime regression.

Sure!

stats.html184 KBDownload

Teaser:

martong added inline comments.Nov 23 2021, 7:39 AM

clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
33	this might be done in ConstraintAssignor in an independent patch Actually, it is not a good idea to bifurcate there (and architecturally impossible ATM), so we should implement that differently.

In D103317#3148868, @martong wrote:

In D103317#3127318, @steakhal wrote:

To me at least, the patch looks good.
Please post some comparative measurements to demonstrate it won't introduce runtime regression.

Sure!

stats.html184 KBDownload

Teaser:

Please repeat the measurement for openssl. There must have been some interference in the memory consumption. Aside from that the results look great.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1105	Where did you address this FIXME?

martong marked an inline comment as done.Nov 23 2021, 8:25 AM

martong added inline comments.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1105	I didn't, but this `FIXME` became obsolete at some point in the past. The reasons: We do support `SymExprs`. In `simplifySVal` we have a full blown SymExpr visitor implemented that checks for constant values in the symbol subtrees. And at L1103 we query the top symbol of tree as well. The git history shows that the FIXME was previously this: `// FIXME: Add support for SymExprs in RangeConstraintManager.` But we do support SymExprs already in RangeConstraintManager ...

steakhal added inline comments.Nov 23 2021, 8:40 AM

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1105	Thanks.

Harbormaster completed remote builds in B135633: Diff 389198.Nov 23 2021, 12:19 PM

Add new test case (for no-crash)

Remove superfluous param (int a) from the new test case

I've added a new test case that triggers an assertion. That problem is getting fixed in a new parent patch, which I am about to add to the stack very soon.

Harbormaster completed remote builds in B136157: Diff 389936.Nov 26 2021, 1:45 AM

martong added a parent revision: D114619: [Analyzer][solver] Do not remove the simplified symbol from the eq class.Nov 26 2021, 2:14 AM

Please repeat the measurement for openssl. There must have been some interference in the memory consumption. Aside from that the results look great.

I did. Average memory usage is unaffected.

stats.html46 KBDownload

In D103317#3155191, @martong wrote:

Please repeat the measurement for openssl. There must have been some interference in the memory consumption. Aside from that the results look great.

I did. Average memory usage is unaffected.

stats.html46 KBDownload

It's good enough for me, thanks.

clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
76	I guess, this statement should not be reachable. Please demonstrate it by using the `clang_analyzer_WarnIfReached()`.

martong marked an inline comment as done.Nov 29 2021, 6:57 AM

martong added inline comments.

clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
76	Actually, L72 simplifies to `1 != b`, L73 simplifies to the same, L74 is `b ^ 2 == 1`. And this system has a solution: `b == -1` satisfies both L72 and L74 conditions. So, this line is reachable, correctly and I've updated the test case like so. Note that, however, the ConstraintAssignor is not smart enough to deduce that `b == -1`. I have a WIP branch to accomplish that here.

Update a test case
Rebase to the parent patch

Harbormaster completed remote builds in B136440: Diff 390343.Nov 29 2021, 8:43 AM

Still looks great.

clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
76	Uh, you are right. Thanks.

This revision is now accepted and ready to land.Nov 29 2021, 8:55 AM

Closed by commit rG0a17896fe6fd: [Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in… (authored by martong). · Explain WhyNov 30 2021, 2:27 AM

This revision was automatically updated to reflect the committed changes.

martong added a commit: rG0a17896fe6fd: [Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in….

martong mentioned this in D114887: [Analyzer][solver] Simplification: Do a fixpoint iteration before the eq class merge.Dec 1 2021, 9:12 AM

martong mentioned this in D114619: [Analyzer][solver] Do not remove the simplified symbol from the eq class.Dec 1 2021, 12:46 PM

martong mentioned this in rG20f8733d4b8d: [Analyzer][solver] Simplification: Do a fixpoint iteration before the eq class….Dec 1 2021, 1:23 PM

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Core/

SimpleSValBuilder.cpp

32 lines

test/

Analysis/

svalbuilder-simplify-compound-svals.cpp

76 lines

taint-tester.c

7 lines

Diff 390637

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp

Show First 20 Lines • Show All 1,096 Lines • ▼ Show 20 Lines if (Optional<loc::ConcreteInt> X = V.getAs<loc::ConcreteInt>())

return &X->getValue(); return &X->getValue();

if (Optional<nonloc::ConcreteInt> X = V.getAs<nonloc::ConcreteInt>()) if (Optional<nonloc::ConcreteInt> X = V.getAs<nonloc::ConcreteInt>())

return &X->getValue(); return &X->getValue();

if (SymbolRef Sym = V.getAsSymbol()) if (SymbolRef Sym = V.getAsSymbol())

return state->getConstraintManager().getSymVal(state, Sym); return state->getConstraintManager().getSymVal(state, Sym);

// FIXME: Add support for SymExprs.

steakhalUnsubmitted

Done

Where did you address this FIXME?

steakhal: Where did you address this FIXME?

martongAuthorUnsubmitted

Done

I didn't, but this FIXME became obsolete at some point in the past.

The reasons:

We do support SymExprs. In simplifySVal we have a full blown SymExpr visitor implemented that checks for constant values in the symbol subtrees. And at L1103 we query the top symbol of tree as well.

The git history shows that the FIXME was previously this: // FIXME: Add support for SymExprs in RangeConstraintManager. But we do support SymExprs already in RangeConstraintManager ...

martong: I didn't, but this `FIXME` became obsolete at some point in the past. The reasons: 1) We do…

steakhalUnsubmitted

Done

Thanks.

steakhal: Thanks.

return nullptr; return nullptr;

} }

SVal SimpleSValBuilder::simplifySVal(ProgramStateRef State, SVal V) { SVal SimpleSValBuilder::simplifySVal(ProgramStateRef State, SVal V) {

// For now, this function tries to constant-fold symbols inside a // For now, this function tries to constant-fold symbols inside a

// nonloc::SymbolVal, and does nothing else. More simplifications should // nonloc::SymbolVal, and does nothing else. More simplifications should

// be possible, such as constant-folding an index in an ElementRegion. // be possible, such as constant-folding an index in an ElementRegion.

Show All 15 Lines SVal cache(SymbolRef Sym, SVal V) {

Cached[Sym] = V; Cached[Sym] = V;

return V; return V;

} }

SVal skip(SymbolRef Sym) { SVal skip(SymbolRef Sym) {

return cache(Sym, SVB.makeSymbolVal(Sym)); return cache(Sym, SVB.makeSymbolVal(Sym));

} }

// Return the known const value for the Sym if available, or return Undef

// otherwise.

SVal getConst(SymbolRef Sym) {

const llvm::APSInt *Const =

State->getConstraintManager().getSymVal(State, Sym);

if (Const)

return Loc::isLocType(Sym->getType()) ? (SVal)SVB.makeIntLocVal(*Const)

: (SVal)SVB.makeIntVal(*Const);

return UndefinedVal();

steakhalUnsubmitted

Done

: (SVal)SVB.makeIntVal(*Const);

- return SVal();

+ return UndefinedSVal();

}

SVal getConstOrVisit(SymbolRef Sym) {

Let's be explicit about it.

steakhal: Let's be explicit about it.

martongAuthorUnsubmitted

Done

Good point.

martong: Good point.

}

SVal getConstOrVisit(SymbolRef Sym) {

const SVal Ret = getConst(Sym);

if (Ret.isUndef())

return Visit(Sym);

return Ret;

steakhalUnsubmitted

Done

SVal getConstOrVisit(SymbolRef Sym) {

- SVal Ret = getConst(Sym);

+ const SVal Ret = getConst(Sym);

if (Ret.isUndef())

- Ret = Visit(Sym);

+ return Visit(Sym);

return Ret;

}

public:

steakhal:

martongAuthorUnsubmitted

Done

Makes sense.

martong: Makes sense.

}

public: public:

Simplifier(ProgramStateRef State) Simplifier(ProgramStateRef State)

: State(State), SVB(State->getStateManager().getSValBuilder()) {} : State(State), SVB(State->getStateManager().getSValBuilder()) {}

SVal VisitSymbolData(const SymbolData *S) { SVal VisitSymbolData(const SymbolData *S) {

// No cache here. // No cache here.

if (const llvm::APSInt *I = if (const llvm::APSInt *I =

SVB.getKnownValue(State, SVB.makeSymbolVal(S))) SVB.getKnownValue(State, SVB.makeSymbolVal(S)))

return Loc::isLocType(S->getType()) ? (SVal)SVB.makeIntLocVal(*I) return Loc::isLocType(S->getType()) ? (SVal)SVB.makeIntLocVal(*I)

: (SVal)SVB.makeIntVal(*I); : (SVal)SVB.makeIntVal(*I);

return SVB.makeSymbolVal(S); return SVB.makeSymbolVal(S);

} }

// TODO: Support SymbolCast. Support IntSymExpr when/if we actually // TODO: Support SymbolCast.

// start producing them.

vsavchenkoUnsubmitted

Done

Do we actually produce them?

vsavchenko: Do we actually produce them?

ASDenysPetrovUnsubmitted

Done

You can met SymbolCast here if evalCast produced it.
See SValBuilder::evalCastSubKind(nonloc::SymbolVal...):

if (!Loc::isLocType(CastTy))
  if (!IsUnknownOriginalType || !CastTy->isFloatingType() ||
      T->isFloatingType())
    return makeNonLoc(SE, T, CastTy);

Also my patch will produce a lot of integral cast symbols D103096. You are welcome to examine it.

ASDenysPetrov: You can met SymbolCast here if `evalCast` produced it. See `SValBuilder::evalCastSubKind(nonloc…

vsavchenkoUnsubmitted

Done

In my comment I meant IntSymExpr

vsavchenko: In my comment I meant `IntSymExpr`

ASDenysPetrovUnsubmitted

Done

Oh, anyway :-)

ASDenysPetrov: Oh, anyway :-)

ASDenysPetrovUnsubmitted

Done

I think we should remove ConcreteInt and and as a consequence IntSymExpr, SymIntExpr in the nearest future. SymbolVal should be reworked to store a QualType, a RangeSet and an expression pointer itself. IMO SymSymExpr and SymbolVal is sufficient to make any calculations. It may help generalize a lot of them. ConcreteInt is just a special case for SymbolVal. More than that PointerToMember can be removed as well, because we would get an info being a pointer to member from QualType in this way.

ASDenysPetrov: I think we should remove **ConcreteInt** and and as a consequence **IntSymExpr**…

martongAuthorUnsubmitted

Done

Do we actually produce them?

Yes we produce IntSymExprs.

martong: > Do we actually produce them? Yes we produce `IntSymExpr`s.

SVal VisitSymIntExpr(const SymIntExpr *S) { SVal VisitSymIntExpr(const SymIntExpr *S) {

auto I = Cached.find(S); auto I = Cached.find(S);

if (I != Cached.end()) if (I != Cached.end())

return I->second; return I->second;

SVal LHS = Visit(S->getLHS()); SVal LHS = getConstOrVisit(S->getLHS());

if (isUnchanged(S->getLHS(), LHS)) if (isUnchanged(S->getLHS(), LHS))

vsavchenkoUnsubmitted

Done

It seems like a very common pattern now:

SVal Const = getConst(Expr);
if (Const.isUndef())
  return Visit(Expr);
return Const;

At least we can make a function out of this, so we don't repeat this pattern over and over.
But I was thinking more radically to actually replace Visit with this, what do you think?

vsavchenko: It seems like a very common pattern now: ``` SVal Const = getConst(Expr); if (Const.isUndef())…

martongAuthorUnsubmitted

Done

Good idea!

martong: Good idea!

return skip(S); return skip(S);

SVal RHS; SVal RHS;

// By looking at the APSInt in the right-hand side of S, we cannot // By looking at the APSInt in the right-hand side of S, we cannot

// figure out if it should be treated as a Loc or as a NonLoc. // figure out if it should be treated as a Loc or as a NonLoc.

// So make our guess by recalling that we cannot multiply pointers // So make our guess by recalling that we cannot multiply pointers

// or compare a pointer to an integer. // or compare a pointer to an integer.

if (Loc::isLocType(S->getLHS()->getType()) && if (Loc::isLocType(S->getLHS()->getType()) &&

Show All 14 Lines SVal VisitSymIntExpr(const SymIntExpr *S) {

S, SVB.evalBinOp(State, S->getOpcode(), LHS, RHS, S->getType())); S, SVB.evalBinOp(State, S->getOpcode(), LHS, RHS, S->getType()));

} }

SVal VisitIntSymExpr(const IntSymExpr *S) { SVal VisitIntSymExpr(const IntSymExpr *S) {

auto I = Cached.find(S); auto I = Cached.find(S);

if (I != Cached.end()) if (I != Cached.end())

return I->second; return I->second;

SVal RHS = Visit(S->getRHS()); SVal RHS = getConstOrVisit(S->getRHS());

if (isUnchanged(S->getRHS(), RHS)) if (isUnchanged(S->getRHS(), RHS))

return skip(S); return skip(S);

SVal LHS = SVB.makeIntVal(S->getLHS()); SVal LHS = SVB.makeIntVal(S->getLHS());

return cache( return cache(

S, SVB.evalBinOp(State, S->getOpcode(), LHS, RHS, S->getType())); S, SVB.evalBinOp(State, S->getOpcode(), LHS, RHS, S->getType()));

} }

SVal VisitSymSymExpr(const SymSymExpr *S) { SVal VisitSymSymExpr(const SymSymExpr *S) {

auto I = Cached.find(S); auto I = Cached.find(S);

if (I != Cached.end()) if (I != Cached.end())

return I->second; return I->second;

// For now don't try to simplify mixed Loc/NonLoc expressions // For now don't try to simplify mixed Loc/NonLoc expressions

// because they often appear from LocAsInteger operations // because they often appear from LocAsInteger operations

// and we don't know how to combine a LocAsInteger // and we don't know how to combine a LocAsInteger

// with a concrete value. // with a concrete value.

if (Loc::isLocType(S->getLHS()->getType()) != if (Loc::isLocType(S->getLHS()->getType()) !=

Loc::isLocType(S->getRHS()->getType())) Loc::isLocType(S->getRHS()->getType()))

return skip(S); return skip(S);

SVal LHS = Visit(S->getLHS()); SVal LHS = getConstOrVisit(S->getLHS());

SVal RHS = Visit(S->getRHS()); SVal RHS = getConstOrVisit(S->getRHS());

if (isUnchanged(S->getLHS(), LHS) && isUnchanged(S->getRHS(), RHS)) if (isUnchanged(S->getLHS(), LHS) && isUnchanged(S->getRHS(), RHS))

return skip(S); return skip(S);

return cache( return cache(

S, SVB.evalBinOp(State, S->getOpcode(), LHS, RHS, S->getType())); S, SVB.evalBinOp(State, S->getOpcode(), LHS, RHS, S->getType()));

} }

SVal VisitSymExpr(SymbolRef S) { return nonloc::SymbolVal(S); } SVal VisitSymExpr(SymbolRef S) { return nonloc::SymbolVal(S); }

Show All 23 Lines

clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp

This file was added.

				// RUN: %clang_analyze_cc1 %s \
				// RUN: -analyzer-checker=core \
				// RUN: -analyzer-checker=debug.ExprInspection \
				// RUN: -analyzer-config eagerly-assume=false \
				// RUN: -verify

				// Here we test whether the SValBuilder is capable to simplify existing
				// compound SVals (where there are at leaset 3 symbols in the tree) based on
				// newly added constraints.

				void clang_analyzer_eval(bool);
				void clang_analyzer_warnIfReached();

				void test_left_tree_constrained(int x, int y, int z) {
				if (x + y + z != 0)
				return;
				if (x + y != 0)
				return;
				clang_analyzer_eval(x + y + z == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(z == 0); // expected-warning{{TRUE}}
				x = y = z = 1;
				return;
				}
				steakhalUnsubmitted Done Reply Inline Actions It feels odd that in some of your examples you return some value, but in the rest, you don't. steakhal: It feels odd that in some of your examples you return some value, but in the rest, you don't.
				martongAuthorUnsubmitted Done Reply Inline Actions yep, I agree, updated. martong: yep, I agree, updated.

				void test_right_tree_constrained(int x, int y, int z) {
				if (x + y * z != 0)
				return;
				if (y * z != 0)
				return;
				clang_analyzer_eval(x + y * z == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(y * z == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(x == 0); // expected-warning{{TRUE}}
				steakhalUnsubmitted Done Reply Inline Actions You could additionally assert that `y == 0` and `z == 0`. steakhal: You could additionally assert that `y == 0` and `z == 0`.
				martongAuthorUnsubmitted Done Reply Inline Actions `y * z == 0` does not imply that both `y` and `z` are `0`. However, we could expect that one of them is `0`, but currently we don't have such a deduction (this might be done in `ConstraintAssignor` in an independent patch). martong: `y * z == 0` does not imply that both `y` and `z` are `0`. However, we could expect that one of…
				martongAuthorUnsubmitted Done Reply Inline Actions this might be done in ConstraintAssignor in an independent patch Actually, it is not a good idea to bifurcate there (and architecturally impossible ATM), so we should implement that differently. martong: > this might be done in ConstraintAssignor in an independent patch Actually, it is not a good…
				return;
				}

				void test_left_tree_constrained_minus(int x, int y, int z) {
				if (x - y - z != 0)
				return;
				if (x - y != 0)
				return;
				clang_analyzer_eval(x - y - z == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(x - y == 0); // expected-warning{{TRUE}}
				clang_analyzer_eval(z == 0); // expected-warning{{TRUE}}
				x = y = z = 1;
				return;
				}

				void test_SymInt_constrained(int x, int y, int z) {
				if (x * y * z != 4)
				return;
				steakhalUnsubmitted Done Reply Inline Actions What if `z` were in the middle? Would it still pass? steakhal: What if `z` were in the middle? Would it still pass?
				martongAuthorUnsubmitted Done Reply Inline Actions No, that would not pass. This is because we do not handle commutativity, for that we should have a canonical form of the symbol trees (check out point 3 here https://reviews.llvm.org/D102696#2784624). martong: No, that would not pass. This is because we do not handle commutativity, for that we should…
				if (z != 2)
				return;
				if (x * y == 3) {
				clang_analyzer_warnIfReached(); // no-warning
				return;
				}
				(void)(x * y * z);
				}

				void test_SValBuilder_simplifies_IntSym(int x, int y, int z) {
				// Most IntSym BinOps are transformed to SymInt in SimpleSValBuilder.
				// Division is one exception.
				x = 77 / (y + z);
				if (y + z != 1)
				return;
				steakhalUnsubmitted Done Reply Inline Actions Would the test pass if you were using `z + y != 1` here? steakhal: Would the test pass if you were using `z + y != 1` here?
				martongAuthorUnsubmitted Done Reply Inline Actions No, because we do not handle commutativity. martong: No, because we do not handle commutativity.
				clang_analyzer_eval(x == 77); // expected-warning{{TRUE}}
				(void)(x * y * z);
				}

				void recurring_symbol(int b) {
				if (b * b != b)
				if ((b * b) * b * b != (b * b) * b)
				if (b * b == 1) // no-crash (assert should not fire)
				clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
				}
				steakhalUnsubmitted Done Reply Inline Actions I guess, this statement should not be reachable. Please demonstrate it by using the `clang_analyzer_WarnIfReached()`. steakhal: I guess, this statement should not be reachable. Please demonstrate it by using the…
				martongAuthorUnsubmitted Done Reply Inline Actions Actually, L72 simplifies to `1 != b`, L73 simplifies to the same, L74 is `b ^ 2 == 1`. And this system has a solution: `b == -1` satisfies both L72 and L74 conditions. So, this line is reachable, correctly and I've updated the test case like so. Note that, however, the ConstraintAssignor is not smart enough to deduce that `b == -1`. I have a WIP branch to accomplish that here. martong: Actually, L72 simplifies to `1 != b`, L73 simplifies to the same, L74 is `b ^ 2 == 1`. And this…
				steakhalUnsubmitted Done Reply Inline Actions Uh, you are right. Thanks. steakhal: Uh, you are right. Thanks.

clang/test/Analysis/taint-tester.c

Show First 20 Lines • Show All 53 Lines • ▼ Show 20 Lines	void taintTracking(int x) {
int ty = xy.y; // FIXME: This should be tainted as well.		int ty = xy.y; // FIXME: This should be tainted as well.
char ntz = xy.z;// no warning		char ntz = xy.z;// no warning
// Now, scanf scans both.		// Now, scanf scans both.
scanf("%d %d", &xy.y, &xy.x);		scanf("%d %d", &xy.y, &xy.x);
int ttx = xy.x; // expected-warning + {{tainted}}		int ttx = xy.x; // expected-warning + {{tainted}}
int tty = xy.y; // expected-warning + {{tainted}}		int tty = xy.y; // expected-warning + {{tainted}}
}		}

void BitwiseOp(int in, char inn) {		void BitwiseOp(int in, char inn, int zz) {
// Taint on bitwise operations, integer to integer cast.		// Taint on bitwise operations, integer to integer cast.
int m;		int m;
int x = 0;		int x = 0;
scanf("%d", &x);		scanf("%d", &x);
int y = (in << (x << in)) * 5;// expected-warning + {{tainted}}		int y = (in << (x << in)) * 5;// expected-warning + {{tainted}}
// The next line tests integer to integer cast.		// The next line tests integer to integer cast.
int z = y & inn; // expected-warning + {{tainted}}		int z = y & inn; // expected-warning + {{tainted}}
if (y == 5) // expected-warning + {{tainted}}		if (y == zz) { // expected-warning + {{tainted}}
m = z \| z;// expected-warning + {{tainted}}		m = z \| z;// expected-warning + {{tainted}}
		}
else		else
m = inn;		m = inn;
int mm = m; // expected-warning + {{tainted}}		int mm = m; // expected-warning 1 {{tainted}}
}		}

// Test getenv.		// Test getenv.
char getenv(const char name);		char getenv(const char name);
void getenvTest(char *home) {		void getenvTest(char *home) {
home = getenv("HOME"); // expected-warning + {{tainted}}		home = getenv("HOME"); // expected-warning + {{tainted}}
if (home != 0) { // expected-warning + {{tainted}}		if (home != 0) { // expected-warning + {{tainted}}
char d = home[0]; // expected-warning + {{tainted}}		char d = home[0]; // expected-warning + {{tainted}}
▲ Show 20 Lines • Show All 116 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the treeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 390637

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp

clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp

clang/test/Analysis/taint-tester.c

[Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the tree
ClosedPublic