This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][solver] Simplify existing constraints when a new constraint is added
ClosedPublic

Authored by martong on May 28 2021, 6:26 AM.

Details

Summary

Update setConstraint to simplify existing constraints (and adding the
simplified constraint) when a new constraint is added. In this patch we just
simply iterate over all existing constraints and try to simplfy them with
simplifySVal. This solves the simplest problematic cases where we have two
symbols in the tree, e.g.:

int test_rhs_further_constrained(int x, int y) {
  if (x + y != 0)
    return 0;
  if (y != 0)
    return 0;
  clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}
  clang_analyzer_eval(y == 0);     // expected-warning{{TRUE}}
  return 0;
}

This patch is the first step of a sequence of patches, and not intended to be
commited as a standalone change. The sequence of patches (and the plan) is
described here: https://reviews.llvm.org/D102696#2784624

Diff Detail

Event Timeline

martong created this revision.May 28 2021, 6:26 AM
martong requested review of this revision.May 28 2021, 6:26 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 28 2021, 6:26 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
vsavchenko requested changes to this revision.EditedMay 28 2021, 6:53 AM

Hey, great job! This is really something that we need, but it's implemented not entirely correctly.
I tried to cover it in the inline comment.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1586–1602

I tried to cover it in the comment to another patch. This solution includes a lot of extra work and it will lose equality/disequality information for simplified expressions, and I think it's safe to say that if a == b then simplify(a) == b.

Let's start with getConstraintMap. It is a completely artificial data structure (and function) that exists for Z3 refutation. It's not what we keep in the state and it has a lot of duplicated constraints. If we have an equivalence class {a, b, c, d, e, f}, we store only one constraint for all of them (thus when we update the class, or one of the members receives a new constraint, we can update all of them). getConstraintMap returns a map where a, b, c, d, e, and f are mapped to the same constraint. It's not super bad, but it's extra work constructing this map and then processing it.

Another, and more important aspect is that when you setConstraint, you lose information that this symbol is equal/disequal to other symbols. One example here would be a situation where x + y == z, and we find out that y == 0, we should update equivalence class {x + y, z} to be a class {x, z}. In order to do this, you need to update two maps: ClassMap (it's mapping x + y to {x + y, z}) and ClassMembers (it's mapping {x + y, z} to x + y and z).

Similar example can be made with x + y != z, but updating ClassMap and ClassMembers will fix it. And you don't even need to touch the actual mapping with the actual constraints.

This revision now requires changes to proceed.May 28 2021, 6:53 AM

Thanks Valeriy for the quick review and guidance! I am planning to do the changes and continue next week :)

vsavchenko added inline comments.May 28 2021, 7:31 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1585

Also I think we can introduce a simple, but efficient optimization of kicking off the simplification process only when Constraint is a constant.

martong marked an inline comment as done.May 31 2021, 8:26 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1585

Yes, good point.

1586–1602

Absolutely, great findings!

I think the most straightforward and consistent implementation of updating ClassMap and ClassMembers is to directly use the merge method. I.e. we can merge the simplified symbol (as a trivial eq class) to the existing equivalence class. Using merge, however, would not remove the non-simplified original symbol. But this might not be a problem; rather it is a necessity (as the child patch demonstrates) it might be very useful if we can find the symbol (without simplification, i.e. as written) in the ConstraintRange map. Do you see any drawbacks of reusing merge here?

martong updated this revision to Diff 348808.May 31 2021, 8:27 AM
martong marked an inline comment as done.
  • Merge the simplified symbol to the old class
vsavchenko added a comment.EditedMay 31 2021, 8:48 AM

That's awesome, just a few stylistic tweaks and tests and we are ready to land!

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1586–1587

I think we need a comment why we care about this early exit.

1586–1602

Oh, that's actually even better!

If we consider the following example.
Let a + b == c and a == d be known and b == 0 to be a new constraint. Then your approach will help us to figure out that c == d.

So, you found a great way!

I think that we should still add the test cases I briefly described in my previous comment and that one from above.

1589

Here I also think that we need to give more context to the readers, so they understand what simplification you are talking about here.

1591–1594

You don't actually use constraints here, so (let me write it in python) instead of:

[update(classMap[class]) for class, constraint in constraints.items()]

you can use

[update(members) for class, members in classMap.items()]
1600

It would be great if we provide some justification why we do merge here.

vsavchenko added inline comments.May 31 2021, 9:08 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1601–1602

Uh-oh, almost let yet another null-state bug to happen! During this iteration, State can become null, so we need to check for it.

vsavchenko added a comment.EditedMay 31 2021, 9:33 AM

I had another thought, merge is usually called in situations when we found out that two symbols should be marked equal (and checked that it's possible to begin with), which is not true in your case.

If we update my case from before, we can get: a + b == c and a != c as given, and b == 0 as a new constraint. In this situation, you will merge classes {a + b, c} and {a}, which contradicts our existing disequality information.

martong marked 5 inline comments as done.EditedJun 1 2021, 5:16 AM

I had another thought, merge is usually called in situations when we found out that two symbols should be marked equal (and checked that it's possible to begin with), which is not true in your case.

If we update my case from before, we can get: a + b == c and a != c as given, and b == 0 as a new constraint. In this situation, you will merge classes {a + b, c} and {a}, which contradicts our existing disequality information.

Yes, we must check the disequivalence classes to discover such a contradiction, I updated the code to do so. Also, added a test case for the contradiction handling.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1591–1594

Actually, trivial equivalence classes (those that have only one symbol member) are not stored in the State. Thus, we must skim through the constraints as well in order to be able to simplify symbols in the constraints.
In short, we have to iterate both collections.

1601–1602

Good catch!

martong updated this revision to Diff 348945.Jun 1 2021, 5:16 AM
martong marked 2 inline comments as done.
  • Simplify equivalence classes when iterate over ClassMap, simplify constraints by iterating over the ConstraintsMap
martong marked 2 inline comments as done.Jun 1 2021, 5:46 AM

I was wondering if there is a direct way to check the equivalence classes?
I am thinking about to add a clang_annalyzer_dump_equivalence_classes function to the ExprInspection checker.

Awesome!
I know, I said that we are ready to land, but I think I was too excited about this change. We probably should have some data on how it performs on real-life codebases.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1580

Maybe it should be a simplify method of the class itself?

1589–1594

I think we can add a method isDisequalTo or just use areEqual in a this way:
are equal?

[Yes] -> nothing to do here
[No]  -> return nullptr
[Don't know] -> merge
1591–1594

Ah, I see. Then I would say that your previous solution is more readable (if we keep simplify, of course).

martong marked 3 inline comments as done.Jun 2 2021, 7:06 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1580

Yeah, makes sense.

1589–1594

Good point, I've added a new overload to the static areEqual and added a method isEqualTo that uses areEqual.

1591–1594

My previous solution might be more readable, though, that's not working.
Actually, I think I failed to explain properly why do we have to iterate both collections.

  • We have to iterate the ConstraintMap because trivial constraints are not stored in the State but we want to simplify symbols in the constraints. So, if we were to iterate over only the ClassMap then the simplest test-case would fail:
int test_rhs_further_constrained(int x, int y) {
  if (x + y != 0)
    return 0;
  if (y != 0)
    return 0;
  clang_analyzer_eval(x + y == 0); // expected-warning{{TRUE}}  
  clang_analyzer_eval(y == 0);     // expected-warning{{TRUE}}     FAIL
  return 0;
}
  • We have to iterate the ClassMap in order to update all equivalence classes that we store in the State. Consider the example you brought up before:
void test_equivalence_classes_are_updated(int a, int b, int c, int d) {
  if (a + b != c)
    return;
  if (a != d)
    return;
  if (b != 0)
    return;
  // Keep the symbols and the constraints! alive.
  (void)(a * b * c * d);
  clang_analyzer_eval(c == d); // expected-warning{{TRUE}}
  return;
}

Before we start to simulate b==0, we have only these equivalence classes in the State: E1{a+b, c} and E2{a, d}. And we have these constraints: SymExpr(a+b==c) -> out-of [0, 0], SymExpr(a==d) -> out-of [0, 0].
Now, when we evaluate b==0in setConstraint when iterating the ConstraintMap then SymExpr(a+b==c) becomes SymExpr(a==c). But the equality classes are not updated. And we can update them if we scan through the ClassMap.

Another alternative solution could be to re-trigger the track mechanism when we iterate over the ConstraintMap, but track seemed to be an exclusive interface towards the higher abstraction RangedConstraintManager. On the other hand, reusing the track mechanism could result better performance than doing another iteration on the ClassMap. Do you think it would be a better approach? And how could we reuse the track mechanism without getting confused with the Adjustment stuff?

martong updated this revision to Diff 349261.Jun 2 2021, 7:06 AM
martong marked 3 inline comments as done.
  • Add isEqualTo and simplify members to EquivalenceClass
martong updated this revision to Diff 349352.Jun 2 2021, 12:32 PM

I am terribly sorry, but I uploaded an unfinished Diff previously, please disregard that. So these are the changes:

  • Add isEqualTo and simplify members to EquivalenceClass

Awesome!
I know, I said that we are ready to land, but I think I was too excited about this change. We probably should have some data on how it performs on real-life codebases.

Just some quick update on the status of this patch. I've done some measurements on smaller open source C projects (e.g tmux) and didn't see any noticeable slow-down. However, I've run into a bad-bad assertion failure in my favorite Checker (StdLibraryFu...). The assertion indicates that neither !State nor State is feasible, so this throws me back to the debugger for a while.

martong updated this revision to Diff 350898.Jun 9 2021, 7:50 AM
  • Simplify the symbol before eq tracking as well

Awesome!
I know, I said that we are ready to land, but I think I was too excited about this change. We probably should have some data on how it performs on real-life codebases.

Just some quick update on the status of this patch. I've done some measurements on smaller open source C projects (e.g tmux) and didn't see any noticeable slow-down. However, I've run into a bad-bad assertion failure in my favorite Checker (StdLibraryFu...). The assertion indicates that neither !State nor State is feasible, so this throws me back to the debugger for a while.

Finally, I could boil down the infeasible parent state problem and added a test case test_deferred_contradiction to catch that. The solution is surprisingly simple: just try to simplify the symbolic expression of an equivalency before we start to update the State with the equivalency info.

OK, we definitely need to know about performance.
Plus, I'm still curious about the crash. I didn't get how simplification helped/caused that crash.

I have one thought here. If the lack of simplification indeed caused the crash, we are in trouble with this patch. IMO simplification in just one place should make it better, but shouldn't produce infeasible states for us. In other words, any number simplifications is a conservative operation that makes our lives a bit better. The moment they become a requirement (i.e. simplifications call for more simplifications or we crash) this solution from this patch has to become much harder. This is because whenever we do merge, we essentially can create another situation when we find out that some symbolic expression is a constant. Let's say that we are merging classes A and B which have constraints [INT_MIN, 42] and [42, INT_MAX]. After the merge, we are positive that all the members of this new class are equal to 42. And if so, we can further simplify classes and their members. This algorithm turns into a fixed point algorithm, which has a good chance to sabotage our performance.

This being said, can we re-iterate on that crash and the proposed fix in much more detail?

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1787

very opinionated nit: can you please add extra new line after this?

1975–1985

Now, since you put this logic into merge, you can just merge.

clang/test/Analysis/find-binop-constraints.cpp
151

It's not really connected to your patch, but this confuses me! Why does the analyzer think that b0 is guaranteed to be 2 after this statement. Even if we eagerly assume here, shouldn't it mean that there are still two paths b0 == 2 and b0 != 2?

156–159

Hmm, I don't see how simplification helped here.

After the previous if statement, we should have had two equivalence classes known to be disequal: reg_$2<int b1> and (reg_$0<int e0>) - (reg_$1<int b0>).
Further, we directly compare these two symbols. We can figure it out without any simplifications.

Am I missing something here?

martong marked 2 inline comments as done.Jun 9 2021, 9:35 AM

I have one thought here. If the lack of simplification indeed caused the crash, we are in trouble with this patch. IMO simplification in just one place should make it better, but shouldn't produce infeasible states for us. In other words, any number simplifications is a conservative operation that makes our lives a bit better. The moment they become a requirement (i.e. simplifications call for more simplifications or we crash) this solution from this patch has to become much harder. This is because whenever we do merge, we essentially can create another situation when we find out that some symbolic expression is a constant. Let's say that we are merging classes A and B which have constraints [INT_MIN, 42] and [42, INT_MAX]. After the merge, we are positive that all the members of this new class are equal to 42. And if so, we can further simplify classes and their members. This algorithm turns into a fixed point algorithm, which has a good chance to sabotage our performance.

Yes, good point(s). I am trying to avoid turning into a fixed point algorithm by directly iterating over the equivalence classes instead of reusing the existing track mechanism. On the other hand, perhaps with some budge the fixpoint algo would be worth to experiment with.

clang/test/Analysis/find-binop-constraints.cpp
151

Don't be puzzled by this. This indeed bifurcates. The interesting path is where b0 == 2 is true.

I am going to update this line with if (b0 ==2) { to achieve a similar effect. (I was using creduce and tried to simplify even more after that, but i missed this.)

156–159

When we evaluate e2 > 0 then we will set e1 as disequal to b1. However, at this point because of the eager constant folding e1 is e0 - 2 (on the path where b0 == 2 is true).

So, when we evaluate b1 == e1 then this is the diseq info we have in the State (I used dumpDisEq from D103967):

reg_$2<int b1>
DisequalTo:
(reg_$0<int e0>) - 2

(reg_$0<int e0>) - 2
DisequalTo:
reg_$2<int b1>

And indeed we ask directly whether the LHS (reg_$2<int b1>) is equal to RHS`(reg_$0<int e0>) - (reg_$1<int b0>). This is because the DeclRefExpr` of e1 is still bound to SVal which originates from the time before we constrained b0 to 2. With other words: the Environment is not changed by introducing a new constraint.

BTW, this test fails even in llvm/main.

martong marked 2 inline comments as done.Jun 9 2021, 11:25 AM
martong added inline comments.
clang/test/Analysis/find-binop-constraints.cpp
156–159

With other words: the Environment is not changed by introducing a new constraint.

This suggests that another approach could be to do change the Environment when we add a new constraint. I am not sure about the pros/cons atm, but might be worth to experiment. What do you think?

OK, we definitely need to know about performance.

Couldn't agree more. I am in the middle of a performance measurement that I do with csa-testbench (on memchached,tmux,curl,twin,redis,vim,openssl,sqlite,ffmpeg,postgresql,tinyxml2,libwebm,xerces,bitcoin,protobuf). Hopefully I can give you some results soon.

Plus, I'm still curious about the crash. I didn't get how simplification helped/caused that crash.

So, the crash was actually an assertion failure in StdLibraryFunctionsChecker, which came when I made a test analysis run on the twin project. The assertion was here:

if (FailureSt && !SuccessSt) {
  if (ExplodedNode *N = C.generateErrorNode(NewState))
    reportBug(Call, N, Constraint.get(), Summary, C);
  break;
} else {
  // We will apply the constraint even if we cannot reason about the
  // argument. This means both SuccessSt and FailureSt can be true. If we
  // weren't applying the constraint that would mean that symbolic
  // execution continues on a code whose behaviour is undefined.
  assert(SuccessSt);                   // <----------------------------------------------------------------- This fired !!!
  NewState = SuccessSt;
}

With multiple creduce iterations below is a minimal example with StdLibraryFunctionsChecker. That crashed when we applied the BufferSize constraint of fread.

typedef int FILE;
long b;
unsigned long fread(void *__restrict, unsigned long, unsigned long,
                    FILE *__restrict);
void foo();
void c(int *a, int e0) {

  int e1 = e0 - b;
  b == 2;
  foo();

  int e2 = e1 - b;
  if (e2 > 0 && b == e1) {
    (void)a; (void)e1; (void)c;
    fread(a, sizeof(char), e1, c);
  }
}

Turned out, the checker had the assertion because before applying the arg constraint and its negated counterpart, the state was already infeasible. (But the analyzer recognized this only when it added the new assumptions when checking the applicability of the arg constraint.)
Thus, I could remove fread and the Checker from the problem set and could create the test case that synthesizes the unfeasible state.

martong updated this revision to Diff 350966.Jun 9 2021, 11:52 AM
martong marked an inline comment as done.
  • Remove isEqualTo
martong marked an inline comment as done.Jun 9 2021, 11:52 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1787

Sure.

1975–1985

Wow, good catch.

martong marked an inline comment as done.Jun 10 2021, 9:18 AM

I have the first measurements results in the attached zip file. The file contains the html file generated by csa-testbench. It's name contains CTU but actually it was a regular non-CTU analysis. The most interesting is probably the run-times, where we can notice a small increase:


Other than that, the number of the warnings seems to be unchanged. The most notable change in the statistics is in the number of paths explored by the analyzer: in some cases (e.g. twin) it increased with 2-3 %.

I have the first measurements results in the attached zip file. The file contains the html file generated by csa-testbench. It's name contains CTU but actually it was a regular non-CTU analysis. The most interesting is probably the run-times, where we can notice a small increase:


Other than that, the number of the warnings seems to be unchanged. The most notable change in the statistics is in the number of paths explored by the analyzer: in some cases (e.g. twin) it increased with 2-3 %.

This sounds amazing! Great job!

vsavchenko accepted this revision.Jun 13 2021, 2:26 AM
This revision is now accepted and ready to land.Jun 13 2021, 2:26 AM
This revision was landed with ongoing or failed builds.Jun 14 2021, 3:19 AM
This revision was automatically updated to reflect the committed changes.

This patch is the first step of a sequence of patches, and not intended to be commited as a standalone change.

Although I planned to commit this in a lock-step when subsequent patches are also accepted, it makes sense to commit now since it's an obvious improvement and the performance penalty remains below a reasonable limit.

Hi,

I'm seeing a failed assertion with this patch.
Reproduce with

clang --analyze bbi-57338.c

Result:

clang: /repo/uabelho/master-github/llvm/include/llvm/ADT/APSInt.h:148: bool llvm::APSInt::operator<(const llvm::APSInt &) const: Assertion `IsUnsigned == RHS.IsUnsigned && "Signedness mismatch!"' failed.

Hi,

I'm seeing a failed assertion with this patch.
Reproduce with

clang --analyze bbi-57338.c

Result:

clang: /repo/uabelho/master-github/llvm/include/llvm/ADT/APSInt.h:148: bool llvm::APSInt::operator<(const llvm::APSInt &) const: Assertion `IsUnsigned == RHS.IsUnsigned && "Signedness mismatch!"' failed.

Good that we found it that early! Thanks Mikael!

Hi,

I'm seeing a failed assertion with this patch.
Reproduce with

clang --analyze bbi-57338.c

Result:

clang: /repo/uabelho/master-github/llvm/include/llvm/ADT/APSInt.h:148: bool llvm::APSInt::operator<(const llvm::APSInt &) const: Assertion `IsUnsigned == RHS.IsUnsigned && "Signedness mismatch!"' failed.

Thanks Mikael for the reproducer, I am going to debug tomorrow.

Hi,

Another failed assertion that started appearing with this patch:

clang --analyze bbi-57589.c

which results in:

clang: ../lib/Support/APInt.cpp:284: int llvm::APInt::compareSigned(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Bit widths must be same for comparison"' failed.


Maybe it's the same root problem, but please make sure you fix both.
Thanks!

Hi,

Another failed assertion that started appearing with this patch:

clang --analyze bbi-57589.c

which results in:

clang: ../lib/Support/APInt.cpp:284: int llvm::APInt::compareSigned(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Bit widths must be same for comparison"' failed.


Maybe it's the same root problem, but please make sure you fix both.
Thanks!

Thanks again Mikael for the report. I could find the root cause and I have a solution that solves the assertions (both test cases are fixed). I am going to upload the fix soon.

Hi,

Another failed assertion that started appearing with this patch:

clang --analyze bbi-57589.c

which results in:

clang: ../lib/Support/APInt.cpp:284: int llvm::APInt::compareSigned(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Bit widths must be same for comparison"' failed.


Maybe it's the same root problem, but please make sure you fix both.
Thanks!

Thanks again Mikael for the report. I could find the root cause and I have a solution that solves the assertions (both test cases are fixed). I am going to upload the fix soon.

Great! Ping me when it's on review, I'll try to look into it ASAP!

Hi,

Another failed assertion that started appearing with this patch:

clang --analyze bbi-57589.c

which results in:

clang: ../lib/Support/APInt.cpp:284: int llvm::APInt::compareSigned(const llvm::APInt &) const: Assertion `BitWidth == RHS.BitWidth && "Bit widths must be same for comparison"' failed.


Maybe it's the same root problem, but please make sure you fix both.
Thanks!

Thanks again Mikael for the report. I could find the root cause and I have a solution that solves the assertions (both test cases are fixed). I am going to upload the fix soon.

Here it is: https://reviews.llvm.org/D104844

I believe this commit exposed a new false-positive bug in [core.DivideZero]. I've filed the report here: https://bugs.llvm.org/show_bug.cgi?id=51940

I believe this exposed another odd issue where a true positive (enabled by this commit) disappears when unrelated code is not present. Bug filed as: https://bugs.llvm.org/show_bug.cgi?id=51950.