This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the tree
ClosedPublic

Authored by martong on May 28 2021, 6:40 AM.

Details

Summary
Add the capability to simplify more complex constraints where there are 3
symbols in the tree. In this change I extend simplifySVal to query constraints
of children sub-symbols in a symbol tree. (The constraint for the parent is
asked in getKnownValue.)

Diff Detail

Event Timeline

martong created this revision.May 28 2021, 6:40 AM
martong requested review of this revision.May 28 2021, 6:40 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2021, 6:40 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Looking great, thanks!
I have a couple of notes below.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1155–1156

Do we actually produce them?

1170–1172

It seems like a very common pattern now:

SVal Const = getConst(Expr);
if (Const.isUndef())
  return Visit(Expr);
return Const;

At least we can make a function out of this, so we don't repeat this pattern over and over.
But I was thinking more radically to actually replace Visit with this, what do you think?

clang/test/Analysis/simplify-complex-constraints.cpp
25–27

Since it's not really specific to addition, maybe we can have tests for other operators?

ASDenysPetrov added inline comments.Jun 2 2021, 3:47 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1155–1156

You can met SymbolCast here if evalCast produced it.
See SValBuilder::evalCastSubKind(nonloc::SymbolVal...):

if (!Loc::isLocType(CastTy))
  if (!IsUnknownOriginalType || !CastTy->isFloatingType() ||
      T->isFloatingType())
    return makeNonLoc(SE, T, CastTy);

Also my patch will produce a lot of integral cast symbols D103096. You are welcome to examine it.

vsavchenko added inline comments.Jun 2 2021, 4:03 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1155–1156

In my comment I meant IntSymExpr

ASDenysPetrov added inline comments.Jun 2 2021, 4:38 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1155–1156

Oh, anyway :-)

ASDenysPetrov added inline comments.Jun 2 2021, 5:01 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1155–1156

I think we should remove ConcreteInt and and as a consequence IntSymExpr, SymIntExpr in the nearest future. SymbolVal should be reworked to store a QualType, a RangeSet and an expression pointer itself. IMO SymSymExpr and SymbolVal is sufficient to make any calculations. It may help generalize a lot of them. ConcreteInt is just a special case for SymbolVal. More than that PointerToMember can be removed as well, because we would get an info being a pointer to member from QualType in this way.

ASDenysPetrov added a comment.EditedJun 2 2021, 7:03 AM

Returning to the discussion raised in D102696, I'd like to share my vision.
I think we can use much easier approach to use valid constraints at any point of time.
The main idea is lazy-reasoning of the ranges.
This approach:

  • doesn't need to use canonicalization;
  • doesn't need to update all the constraints on each setConstraint call (using brute-force or any other structure traversals);
  • add an additional bindings for every simple operand on each assignment or initialization.
  • remove an invalid bindings for every simple operand on each assignment or initialization.
  • recursively get the range when and only when needed (lazy-reasoning).

Example:

a = b + 1;
c = a - 5;
if (b != 42) return 0;
if(c == 38) return 1;
return 0;

a = b + 1;

  1. LHS a lookup. LHS not found.
  2. Add binding $a = $b + 1
  3. Traverse through RHS b + 1.
  4. b lookup. b not found.
  5. Built a new expression a - 1 for b.
  6. Add binding for b: $b = $a - 1
  7. 1 is a constant. Skip.

c = a - 5;

  1. LHS c lookup. LHS not found.
  2. Add binding $c = $a - 5
  3. Traverse through RHS a - 5.
  4. a lookup. a found.
  5. If LHS not found and a found, then skip.
  6. 5 is a constant. Skip.

if (b != 42) return;

  1. Get the range of $b. Direct range not found. Add $b to the list of visited symbols.
  2. Substitute bindings. From $b != 42 to $a - 1 == 42.
  3. Traverse through $a - 1.
  4. Get the range of $a. Direct range not found. Add $a to the list of visited symbols.
  5. 1 is a constant. Skip.
  6. Substitute bindings. From $a - 1 == 42 to $b + 1 - 1 == 42.
  7. Traverse through $b + 1 - 1.
  8. $b is in the list of already visited symbols. No way to find a range.
  9. Create a range from the type [MIN, MAX].
  10. Perform comparison and produce a constraint for b [42, 42].

if(c == 38) return;

  1. Get the range of $c. Direct range not found. Add $c to the list of visited symbols.
  2. Substitute bindings. From $c == 38 to $a - 5 == 38.
  3. Traverse through $a - 5.
  4. Get the range of $a. Direct range not found. Add $a to the list of visited symbols.
  5. 5 is a constant. Skip.
  6. Substitute bindings. From $a - 5 == 38 to $b + 1 - 5 == 38.
  7. Traverse through $b + 1 - 5.
  8. Get the range of $b. Direct range found [42, 42].
  9. 1 is a constant. Skip.
  10. Calculate(simplificate the expression etc.) a new range for $c [38, 38].
  11. Condition is true.

return 1

P.S. I've just started thinking of the way to solve this problem. For now it's just a sketch of where I'm going. I think this can even replace EquivalenceClass.

Returning to the discussion raised in D102696, I'd like to share my vision.
I think we can use much easier approach to use valid constraints at any point of time.
The main idea is lazy-reasoning of the ranges.
This approach:

  • doesn't need to use canonicalization;
  • doesn't need to update all the constraints on each setConstraint call (using brute-force or any other structure traversals);
  • add an additional bindings for every simple operand on each assignment or initialization.
  • remove an invalid bindings for every simple operand on each assignment or initialization.
  • recursively get the range when and only when needed (lazy-reasoning).

Example:

a = b + 1;
c = a - 5;
if (b != 42) return 0;
if(c == 38) return 1;
return 0;

a = b + 1;

  1. LHS a lookup. LHS not found.
  2. Add binding $a = $b + 1
  3. Traverse through RHS b + 1.
  4. b lookup. b not found.
  5. Built a new expression a - 1 for b.
  6. Add binding for b: $b = $a - 1
  7. 1 is a constant. Skip.

c = a - 5;

  1. LHS c lookup. LHS not found.
  2. Add binding $c = $a - 5
  3. Traverse through RHS a - 5.
  4. a lookup. a found.
  5. If LHS not found and a found, then skip.
  6. 5 is a constant. Skip.

if (b != 42) return;

  1. Get the range of $b. Direct range not found. Add $b to the list of visited symbols.
  2. Substitute bindings. From $b != 42 to $a - 1 == 42.
  3. Traverse through $a - 1.
  4. Get the range of $a. Direct range not found. Add $a to the list of visited symbols.
  5. 1 is a constant. Skip.
  6. Substitute bindings. From $a - 1 == 42 to $b + 1 - 1 == 42.
  7. Traverse through $b + 1 - 1.
  8. $b is in the list of already visited symbols. No way to find a range.
  9. Create a range from the type [MIN, MAX].
  10. Perform comparison and produce a constraint for b [42, 42].

if(c == 38) return;

  1. Get the range of $c. Direct range not found. Add $c to the list of visited symbols.
  2. Substitute bindings. From $c == 38 to $a - 5 == 38.
  3. Traverse through $a - 5.
  4. Get the range of $a. Direct range not found. Add $a to the list of visited symbols.
  5. 5 is a constant. Skip.
  6. Substitute bindings. From $a - 5 == 38 to $b + 1 - 5 == 38.
  7. Traverse through $b + 1 - 5.
  8. Get the range of $b. Direct range found [42, 42].
  9. 1 is a constant. Skip.
  10. Calculate(simplificate the expression etc.) a new range for $c [38, 38].
  11. Condition is true.

return 1

P.S. I've just started thinking of the way to solve this problem. For now it's just a sketch of where I'm going.

Great to know that you are also thinking about this problem!
The more the merrier!

I have a couple of notes about your suggestion.
First is minor, a = b + 1 and c = a + 5 won't actually produce any constraints. Starting from those statements if you request a symbol for a you will get b + 1 and b + 6 for c.
Of course, you can change your code so that we make an assumption that a == b + 1 and c == a + 5 and those would be constraints. That's why I said that it's minor.

Now, for the biggest concern. You cover in a very detailed manner all the traversals and caches for visited symbols, but have Substitute bindings as a sub-step , which is HUGE and absolutely non-trivial.
In your examples, you have a linear chain of related symbolic expressions, i.e. something like this x -> y + 2, y -> z + 3, z -> 4 so that we have x -> 9 after three substitutions. But the problem it is generally not one-to-one relationship, so x -> y1 + 1, x -> y2 + 2, ... , x -> yN + N. Imagine that each yK is involved with another M constraints. So, you would need to check x -> y1 -> z11, if it's not working - check x -> y1 -> z12 and so on. When you run out of options for z1K, you need to backtrack and try x -> y2 -> z21, etc. As you can see, this is BRUTE FORCE.

And it couldn't be easier theoretically. So, instead of trying that many options every time we need to judge about, we take a bit simpler shortcut. It's far from being complete, but honestly we're not pursuing completeness. For my example above, whenever we find out that zKM -> 5, we simplify everything that involves zKM and get yK -> 10 and get x -> 15. It is far from being perfect, I don't like that we need to go through all symbols and try to simplify every single symbol, and it would be good to optimize this scenario (mapping symbols to all symbolic expressions that contain them, for example), but essentially it is cheaper in the long run.

But the problem it is generally not one-to-one relationship, so x -> y1 + 1, x -> y2 + 2, ... , x -> yN + N.

In my approach it can't be more then one binding for one symbol. Like:
x = y + z; produces $x = $y + $z, $y = $x - $z, $z = $x - $y.
Then if z = a;, all above should be invalidated (removed), because we can't rely on those expressions any more. And new ones should be added $z = $a and $a = $z.

As you can see, this is BRUTE FORCE.

I would say this is a linked list with stop-markers, not blindly passing through all the bindings. And I don't talk about implementation details. For now we just need to elaborate an approach.

But the problem it is generally not one-to-one relationship, so x -> y1 + 1, x -> y2 + 2, ... , x -> yN + N.

In my approach it can't be more then one binding for one symbol. Like:
x = y + z; produces $x = $y + $z, $y = $x - $z, $z = $x - $y.

Hmm, Okay, but what about situations if you have: a = a1 + a2 and a = a3 + a4 + a5 are you going to throw away one of these constraints? And if so, how do you want to select which one?

ASDenysPetrov added a comment.EditedJun 2 2021, 9:11 AM

Hmm, Okay, but what about situations if you have: a = a1 + a2 and a = a3 + a4 + a5 are you going to throw away one of these constraints? And if so, how do you want to select which one?

Are you talking about comparison or assignment? Both assignments can't be valid at the same time, and latter replaces bindings of former. In case of comparisons, they both can be valid.
But we should keep in mind that assignment is a write operation which replaces and invalidates previous bindings, and comparison is a read operation. It can add new bindings but can not remove old ones.
This is what I haven't dig deep enough yet. Let's do this together how we can handle that.

Hmm, Okay, but what about situations if you have: a = a1 + a2 and a = a3 + a4 + a5 are you going to throw away one of these constraints? And if so, how do you want to select which one?

Are you talking about comparison or assignment? Both assignments can't be valid at the same time, and latter replaces bindings of former. In case of comparisons, they both can be valid.
But we should keep in mind that assignment is a write operation which replaces and invalidates previous bindings, and comparison is a read operation. It can add new bindings but can not remove old ones.
This is what I haven't dig deep enough yet. Let's do this together how we can handle that.

I replied to you earlier that assignments are not producing constraints. The analyzer has some sort of SSA (not really, but anyways), so every time we reassign the variable it actually becomes a different symbol. So, from this perspective DeclRefExpr x and symbol x are two different things:

int x = foo(); // after this DeclRefExpr and VarDecl x are associated with conjured symbol #1 (aka conj#1)
int c = x + 10; // c is associated with `$conj#1 + 10`
x = a + 1; // DeclRefExpr x is now associated with `$a + 1`
x = c + x; // and now it is `conj#1 + 11`

There is no symbolic assignment in the static analyzer. Symbols are values, values don't change. We can only obtain more information of what particular symbolic value represents.

I hope it makes it more clear.

As for equality, we already track equivalence classes of symbols and they can have more than 2 symbols in them. Actually, this whole data structure mostly makes sense for cases when the class has more than 2 elements in it.

ASDenysPetrov added a comment.EditedJun 2 2021, 10:24 AM

! In D103317#2793797, @vsavchenko wrote:

I replied to you earlier that assignments are not producing constraints. The analyzer has some sort of SSA (not really, but anyways), so every time we reassign the variable it actually becomes a different symbol. So, from this perspective DeclRefExpr x and symbol x are two different things:

int x = foo(); // after this DeclRefExpr and VarDecl x are associated with conjured symbol #1 (aka conj#1)
int c = x + 10; // c is associated with `$conj#1 + 10`
x = a + 1; // DeclRefExpr x is now associated with `$a + 1`
x = c + x; // and now it is `conj#1 + 11`

There is no symbolic assignment in the static analyzer. Symbols are values, values don't change. We can only obtain more information of what particular symbolic value represents.

I hope it makes it more clear.

As for equality, we already track equivalence classes of symbols and they can have more than 2 symbols in them. Actually, this whole data structure mostly makes sense for cases when the class has more than 2 elements in it.

Thanks for the clarifications.

I'm trying to say that we need to keep our attention on assignments/initializations as well if we want to solve this problem efficiently. I didn't say that assignments produce constraints. Of course they do not :-). I want we produce more bindings(associations) on assignments to use them as mapping for further substitutions and traversing to get the right range.

As for the equivalence classes. I think we should go away from special cases. I mean what about lower-then classes or greater-or-equal classes. I think you got what I mean. I'm trying to find a general approach which can cover almost all individual improvements around CSA codebase(about ranges).

I started working on this when saw your discussion. Just what you also consider the direction I'm moving. A lot of corner-examples, like you gave me, are welcome. Thank you for the feedbacks!

! In D103317#2793797, @vsavchenko wrote:

I replied to you earlier that assignments are not producing constraints. The analyzer has some sort of SSA (not really, but anyways), so every time we reassign the variable it actually becomes a different symbol. So, from this perspective DeclRefExpr x and symbol x are two different things:

int x = foo(); // after this DeclRefExpr and VarDecl x are associated with conjured symbol #1 (aka conj#1)
int c = x + 10; // c is associated with `$conj#1 + 10`
x = a + 1; // DeclRefExpr x is now associated with `$a + 1`
x = c + x; // and now it is `conj#1 + 11`

There is no symbolic assignment in the static analyzer. Symbols are values, values don't change. We can only obtain more information of what particular symbolic value represents.

I hope it makes it more clear.

As for equality, we already track equivalence classes of symbols and they can have more than 2 symbols in them. Actually, this whole data structure mostly makes sense for cases when the class has more than 2 elements in it.

Thanks for the clarifications.

I'm trying to say that we need to keep our attention on assignments/initializations as well if we want to solve this problem efficiently. I didn't say that assignments produce constraints. Of course they do not :-). I want we produce more bindings(associations) on assignments to use them as mapping for further substitutions and traversing to get the right range.

As for the equivalence classes. I think we should go away from special cases. I mean what about lower-then classes or greater-or-equal classes. I think you got what I mean. I'm trying to find a general approach which can cover almost all individual improvements around CSA codebase(about ranges).

I started working on this when saw your discussion. Just what you also consider the direction I'm moving. A lot of corner-examples, like you gave me, are welcome. Thank you for the feedbacks!

Thank you for your comments Denys! It's great to see that some else is also bothered by these problems, and the more approaches we have the deeper the overall understanding is.

! In D103317#2793797, @vsavchenko wrote:

I replied to you earlier that assignments are not producing constraints. The analyzer has some sort of SSA (not really, but anyways), so every time we reassign the variable it actually becomes a different symbol. So, from this perspective DeclRefExpr x and symbol x are two different things:

int x = foo(); // after this DeclRefExpr and VarDecl x are associated with conjured symbol #1 (aka conj#1)
int c = x + 10; // c is associated with `$conj#1 + 10`
x = a + 1; // DeclRefExpr x is now associated with `$a + 1`
x = c + x; // and now it is `conj#1 + 11`

There is no symbolic assignment in the static analyzer. Symbols are values, values don't change. We can only obtain more information of what particular symbolic value represents.

I hope it makes it more clear.

As for equality, we already track equivalence classes of symbols and they can have more than 2 symbols in them. Actually, this whole data structure mostly makes sense for cases when the class has more than 2 elements in it.

Thanks for the clarifications.

I'm trying to say that we need to keep our attention on assignments/initializations as well if we want to solve this problem efficiently. I didn't say that assignments produce constraints. Of course they do not :-). I want we produce more bindings(associations) on assignments to use them as mapping for further substitutions and traversing to get the right range.

As for the equivalence classes. I think we should go away from special cases. I mean what about lower-then classes or greater-or-equal classes. I think you got what I mean. I'm trying to find a general approach which can cover almost all individual improvements around CSA codebase(about ranges).

I started working on this when saw your discussion. Just what you also consider the direction I'm moving. A lot of corner-examples, like you gave me, are welcome. Thank you for the feedbacks!

Assignments and initializations don't affect anything, we shouldn't care about them at all because they don't affect symbolic values in any way. You keep bringing up assignments/initializations when I repeatedly tell you that these are orthogonal to symbolic values, they simply associate certain symbolic values/expressions with memory regions (aka bind them).

I want to make sure you understand that there is no silver bullet here. There is no one magic method that will resolve everything in polynomial let alone linear time (unless someone proves that NP = P). This problem is inherently exponential. We try to avoid it at all costs (and, thus, suck sometimes).
In general, when SMT-solvers need to solve a set of bit-vector equations, they use a technique called bit blasting - just turning each n-bit integer into n-separate boolean variables, and using SAT solver after that. More advanced solvers use special theories before to reduce dimensionality of the problem, one of them is equality theory. These theories are not limitations or "special cases", they are pieces of fundamental knowledge about the domain. Those are as generic as you can go. One symbolic expression can be equal to multiple symbolic expressions at the same time, it is the fact, not a special case.

I'm sorry for being too harsh here, but this problem IS harder than you think. If you want to approach it correctly, and not waste your time, you should get these fundamentals straight.

@vsavchenko

I appologize for my immaturity and for the audacity of my infantile assumptions. I admit any constructive criticism. Thank you for referencing the theory. I will thoroughly study the domain in all.

As for the assignments and why I brought them up, that's because I just tried to look at the problem from another angle. Consider two snippets:

void f(int a, int b, int c, int x, int y){
  a = b;
  x = y;
  c = a;
  // What can you say about a, x, c?
  // `a` equals to `b` and 'c'
  // `x` equals to `y`
  // `c` equals to `a` and `b`
}

and

void f(int a, int b, int c, int x, int y){
  if(a == b) {
    if(x == y) {
      if(c == a) {
        // What can you say about a, x, c?
        // `a` equals to `b` and 'c'
        // `x` equals to `y`
        // `c` equals to `a` and `b`
      }
    }
  }
}

They both generates the same relationship between the variables. Assignment seemed to me as just a special case in a true branch of == operator and can be substituted with it in calculations. So, I decided to make some assumptions in this way. Digging more, it opened a lot of gaps and lacks for me.
Next time I will investigate deeper to present a finalized solution and not wasting your time explaining me the basics.

@vsavchenko

I appologize for my immaturity and for the audacity of my infantile assumptions. I admit any constructive criticism. Thank you for referencing the theory. I will thoroughly study the domain in all.

No need to apologize! I do like your enthusiasm, great things come from it. But as I said, I don't want you to waste your own time pursuing things that won't work.

As for the assignments and why I brought them up, that's because I just tried to look at the problem from another angle. Consider two snippets:

OK, let me annotate the code, so we can hopefully close this topic once and for all.

void f(int a, int b, int c, int x, int y){

At this point we have 5 memory regions a, b, c, x and y. They represent the memory cells allocated for the corresponding parameters on stack. When we call function, these actually have some values, but we don't know them due to the limitations of the analysis. For this reason, we say that they contain symbolic values. So, let's simply enumerate that so that there is no misunderstanding. Values: #1, #2, #3, #4, and #5. These are not real numbers, we enumerate some unknown values. As I said, we introduced them as initial values for parameters, so our store (mapping from memory to values) looks likes this: a -> #1, b -> #2, c -> #3, x -> #4, y -> #5. We don't know anything about these values at this point.

a = b;

What do we know at this point? Do we get more information about the value that was stored in a before? No, we overwrote this value entirely, and what changed here is the mapping: a -> #2, b -> #2, c -> #3, x -> #4, y -> #5 because b contained value #2. Symbolic value #1 is gone, it's dead. No-one will ever see or use it again.

x = y;

Using the same logic: a -> #2, b -> #2, c -> #3, x -> #5, y -> #5.

c = a;

And again: a -> #2, b -> #2, c -> #2, x -> #5, y -> #5.

  // What can you say about a, x, c?
  // `a` equals to `b` and 'c'
  // `x` equals to `y`
  // `c` equals to `a` and `b`
}

Yes, they are, but we don't need constraints for that. We know which values are associated with each memory region and we can compare them. That's all, this simple. That's why assignments and initializations DON'T MATTER for the solver, these are simply STORE of some symbolic value into memory. Solver cares only about symbolic values, memory is irrelevant.

and

void f(int a, int b, int c, int x, int y){

As before: a -> #1, b -> #2, c -> #3, x -> #4, y -> #5.

if(a == b) {

Now, the value that we imagine to be in memory region a (namely #1) is NOT GONE! It is still there, it is being used. And what we know after this if, is that on one of the paths after it #1 == #2 and on the other path #1 != #2. These are just values that we know nothing about, but we started to gain some knowledge.
So, the store is the same (the values stored in those memory cell are the same as they were before): a -> #1, b -> #2, c -> #3, x -> #4, y -> #5, but now we add constraint #1 == #2.

if(x == y) {

The same here, store is the same, but constraints are now: #1 == #2 and #4 == #5.

if(c == a) {

#1 == #2 == #3 and #4 == #5

        // What can you say about a, x, c?
        // `a` equals to `b` and 'c'
        // `x` equals to `y`
        // `c` equals to `a` and `b`
      }
    }
  }
}

We can actually figure out that #2 == #3 because of equality theory. We only knew #1 == #2 and that #3 == #1, but equality is a transitive binary relation and we reflect that notion in our equivalence classes data-structure.

They both generates the same relationship between the variables. Assignment seemed to me as just a special case in a true branch of == operator and can be substituted with it in calculations. So, I decided to make some assumptions in this way. Digging more, it opened a lot of gaps and lacks for me.

Relationship between variables are the same, but neither solver nor the analyzer reason about things in terms of variables. It's memory and values that the memory carries. In the first example, we have only 2 values stored in 5 memory locations. In the second example, it's 5 values stored in 5 memory locations. On some of the paths we happen to know how those values are related (which is not true for other paths), that's constraints.

I hope now it's clear.

Next time I will investigate deeper to present a finalized solution and not wasting your time explaining me the basics.

We probably should not have discussions like these because they are not directly about the patch on review. If you want to continue this discussion, let's bring it to [cfe-dev].

martong updated this revision to Diff 386796.Nov 12 2021, 4:04 AM
martong marked 7 inline comments as done.
  • Rebase on top of parent revisions
  • Address review comments
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1155–1156

Do we actually produce them?

Yes we produce IntSymExprs.

1170–1172

Good idea!

clang/test/Analysis/simplify-complex-constraints.cpp
25–27

Ok, I've added a new test, and modified a bit this test.

martong retitled this revision from [Analyzer][engine][solver] Simplify complex constraints to [Analyzer][Core] Make SValBuilder to better simplify svals with 3 symbols in the tree.Nov 12 2021, 4:33 AM
martong edited the summary of this revision. (Show Details)
martong updated this revision to Diff 386806.Nov 12 2021, 4:35 AM
martong edited the summary of this revision. (Show Details)
  • Update in comments: Unknown -> Undef

To me at least, the patch looks good.
Please post some comparative measurements to demonstrate it won't introduce runtime regression.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1149

Let's be explicit about it.

1153–1156
clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
23 ↗(On Diff #386806)

It feels odd that in some of your examples you return some value, but in the rest, you don't.

32 ↗(On Diff #386806)

You could additionally assert that y == 0 and z == 0.

50 ↗(On Diff #386806)

What if z were in the middle? Would it still pass?

65 ↗(On Diff #386806)

Would the test pass if you were using z + y != 1 here?

martong updated this revision to Diff 389198.Nov 23 2021, 7:24 AM
martong marked 6 inline comments as done.
  • Return explicitly with UndefinedVal
  • Unify test cases (return 0 -> return)
martong added inline comments.Nov 23 2021, 7:26 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1149

Good point.

1153–1156

Makes sense.

clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
23 ↗(On Diff #386806)

yep, I agree, updated.

32 ↗(On Diff #386806)

y * z == 0 does not imply that both y and z are 0. However, we could expect that one of them is 0, but currently we don't have such a deduction (this might be done in ConstraintAssignor in an independent patch).

50 ↗(On Diff #386806)

No, that would not pass. This is because we do not handle commutativity, for that we should have a canonical form of the symbol trees (check out point 3 here https://reviews.llvm.org/D102696#2784624).

65 ↗(On Diff #386806)

No, because we do not handle commutativity.

To me at least, the patch looks good.
Please post some comparative measurements to demonstrate it won't introduce runtime regression.

Sure!

Teaser:

martong added inline comments.Nov 23 2021, 7:39 AM
clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
32 ↗(On Diff #386806)

this might be done in ConstraintAssignor in an independent patch

Actually, it is not a good idea to bifurcate there (and architecturally impossible ATM), so we should implement that differently.

To me at least, the patch looks good.
Please post some comparative measurements to demonstrate it won't introduce runtime regression.

Sure!

Teaser:

Please repeat the measurement for openssl. There must have been some interference in the memory consumption. Aside from that the results look great.

clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1110

Where did you address this FIXME?

martong marked an inline comment as done.Nov 23 2021, 8:25 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1110

I didn't, but this FIXME became obsolete at some point in the past.

The reasons:

  1. We do support SymExprs. In simplifySVal we have a full blown SymExpr visitor implemented that checks for constant values in the symbol subtrees. And at L1103 we query the top symbol of tree as well.
  1. The git history shows that the FIXME was previously this: // FIXME: Add support for SymExprs in RangeConstraintManager. But we do support SymExprs already in RangeConstraintManager ...
steakhal added inline comments.Nov 23 2021, 8:40 AM
clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp
1110

Thanks.

martong updated this revision to Diff 389935.Nov 26 2021, 1:14 AM
martong marked an inline comment as done.
  • Add new test case (for no-crash)
martong updated this revision to Diff 389936.Nov 26 2021, 1:16 AM
  • Remove superfluous param (int a) from the new test case

I've added a new test case that triggers an assertion. That problem is getting fixed in a new parent patch, which I am about to add to the stack very soon.

Please repeat the measurement for openssl. There must have been some interference in the memory consumption. Aside from that the results look great.

I did. Average memory usage is unaffected.

Please repeat the measurement for openssl. There must have been some interference in the memory consumption. Aside from that the results look great.

I did. Average memory usage is unaffected.

It's good enough for me, thanks.

clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
75 ↗(On Diff #389936)

I guess, this statement should not be reachable.
Please demonstrate it by using the clang_analyzer_WarnIfReached().

martong marked an inline comment as done.Nov 29 2021, 6:57 AM
martong added inline comments.
clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
75 ↗(On Diff #389936)

Actually, L72 simplifies to 1 != b, L73 simplifies to the same, L74 is b ^ 2 == 1. And this system has a solution: b == -1 satisfies both L72 and L74 conditions.
So, this line is reachable, correctly and I've updated the test case like so.

Note that, however, the ConstraintAssignor is not smart enough to deduce that b == -1. I have a WIP branch to accomplish that here.

martong updated this revision to Diff 390343.Nov 29 2021, 6:57 AM
martong marked an inline comment as done.
  • Update a test case
  • Rebase to the parent patch
steakhal accepted this revision.Nov 29 2021, 8:55 AM

Still looks great.

clang/test/Analysis/svalbuilder-simplify-compound-svals.cpp
75 ↗(On Diff #389936)

Uh, you are right. Thanks.

This revision is now accepted and ready to land.Nov 29 2021, 8:55 AM