This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][solver] Track symbol equivalence
ClosedPublic

Authored by vsavchenko on Jun 24 2020, 3:08 AM.

Details

Summary

For the most cases, we try to reason about symbol either based on the
information we know about that symbol in particular or about its
composite parts. This is faster and eliminates costly brute force
searches through existing constraints.

However, we do want to support some cases that are widespread enough
and involve reasoning about different existing constraints at once.
These include:

  • resoning about 'a - b' based on what we know about 'b - a'
  • reasoning about 'a <= b' based on what we know about 'a > b' or 'a < b'

This commit expands on that part by tracking symbols known to be equal
while still avoiding brute force searches. It changes the way we track
constraints for individual symbols. If we know for a fact that 'a == b'
then there is no need in tracking constraints for both 'a' and 'b' especially
if these constraints are different. This additional relationship makes
dead/live logic for constraints harder as we want to maintain as much
information on the equivalence class as possible, but we still won't
carry the information that we don't need anymore.

Diff Detail

Event Timeline

vsavchenko created this revision.Jun 24 2020, 3:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 24 2020, 3:08 AM

Fix incorrect comment

I only checked the test cases and the comments so far and it looks very useful and promising. I really hope that the performance will be ok :) I'll look at the actual code changes later.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
425

I found the mention of union confusing here. Especially since merging means intersection in terms of the ranges. I think I know what you meant but some additional clarification is welcome.

vsavchenko marked an inline comment as done.Jun 24 2020, 4:32 AM

I really hope that the performance will be ok :)

I had a test run for one of the earlier implementations of this and it looks like it adds 1-2% to the execution time. And of course I will attach more info on that later.

Additionally, I have a couple of disappeared FPs concerning reference counting, that I still need to figure out.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
425

Gotcha, I guess I'll add parallels with the Union-Find data structure in the implementation and make it more transparent.

NoQ added inline comments.Jun 24 2020, 9:01 PM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
452

Assignment operator is currently the only thing that makes this class mutable (by allowing to change the ID after the class is constructed). Given that we want to put it into the program state, i don't think we want it to be mutable. Can we remove this operator?

537

Do i understand correctly that this isn't used yet and it's for the later patches?

1431

This makes me slightly worry about nondeterminism: height may depend on things like numeric values of pointers. I guess at the end of the day this will only manifest in choosing a different representative for the merged class, so it shouldn't be too bad, but i'd still rather double-check.

1656–1658

Ok, this turned out to be much scarier than i expected. At least, can we somehow assert that our data structures remain internally consistent after these operations? I.e., things like "a symbol points to an equivalence class iff it belongs to the set of members of that class", etc.

vsavchenko marked 4 inline comments as done.Jun 25 2020, 5:22 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
452

Sure, that's a good point!

537

Not exactly, it is used in two symmetric cases:

  1. We assumed a condition and we try to understand if what we assumed is an equality operation
  2. We try understand something about a symbol and we want to understand if it is an equality operation

So, in the first case, the branch covering IsEquality == false does nothing and is designed for the later patches.

The second case, however, does work right now. If we see an equality operation where operands are known to be a part of the same class, we can tell for sure the result of the comparison. This way a == b is true and a != b is false. You can find this logic in getRangeForEqualities.

1431

I agree, but as you pointed out correctly, it affects only which ID is used for the class. Other ID is cease to exist after this function. ID is used other than for comparison only for trivial classes, which is not going to happen because it is guaranteed to be non-trivial after the merge. The only thing that it might affect is the ordering in the map, but we already have a problem with that as we use pointers for keys.

1656–1658

Assertion like this might cost us double of what we have in this function right now.

NoQ added inline comments.Jun 25 2020, 6:27 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
537

Ok, makes sense!

1656–1658

Sounds pretty amazing and worth every microsecond.

vsavchenko updated this revision to Diff 275608.Jul 6 2020, 2:00 AM

Fix review remarks

NoQ accepted this revision.Jul 6 2020, 8:16 PM

🎉!

I vaguely remember that we talked offline about a few nasty test cases that with casts and overflows, would it make sense to add them? Like, even if they don't pass yet, they may prove valuable.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1467

"lose"!

1482

"I am the solver!!!"

I guess what you're trying to say is that every time there's an infeasible state here, technically one of the infer() functions had a chance to prevent it (not necessarily easily).

This revision is now accepted and ready to land.Jul 6 2020, 8:16 PM
vsavchenko updated this revision to Diff 275949.Jul 7 2020, 2:08 AM

Fix comments and add a test for downcasts

xazax.hun added inline comments.Jul 7 2020, 2:31 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
506

Does the semantics if this differ from llvm::APInt::isNullValue?

1249

Sorry, but I am a bit confused here.

Does haveEqualConstraints have anything to do with equivalence classes?

I.e., what if I have two symbols with the same constraints (but those two symbols were never compared).

void f(int a, int b) {
  int c = clamp(a, 5, 10);
  int d = clamp(b, 5, 10);
}

In the code above if analyzer understands clamp, the range for c and d will be the same. Even though we never really compared them.
I would expect haveEqualConstraints to return true, even if they do not belong to the same equivalence class.

Do I miss something?

1375

So we basically do a conversion to Symbol -> Ranges from Class -> Ranges.
I wonder if it is possible to get rid of this conversion in the future to get some performance benefits.
We could either make all code work with both kind of range maps or have something like (Symbol + Class) -> Ranges to avoid conversion.
What do you think?

I am not opposed to this code at the moment, I just wonder if there is a relatively low hanging fruit for some performance gains in the future.

xazax.hun added inline comments.Jul 7 2020, 2:37 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1249

Never mind, I misunderstood this function. It operates on program states not on symbols.

vsavchenko marked 8 inline comments as done.Jul 7 2020, 3:34 AM
vsavchenko added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
506

Good catch!

1249

Nothing to be sorry about, I'm happy to clarify!

haveEqualConstraints is here for our join heuristic that helps removing duplicating paths.

1375

This is here only for a very specific mode - when we double check found warnings with Z3. That mode needs constraints assigned to symbols, so here we construct such a set on-the-fly. So, we store ALL of the ranges in the map Class -> Ranges and construct Symbol -> Ranges on very special occasions.

vsavchenko updated this revision to Diff 275981.Jul 7 2020, 3:41 AM

Get rid of 'isZero'

xazax.hun accepted this revision.Jul 7 2020, 4:59 AM

Thanks, LGTM!

This revision was automatically updated to reflect the committed changes.

Hi Valery,

Together with @steakhal we have found a serious issue.
The below code crashes if you compile with -DEXPENSIVE_CHECKS.
The analyzer goes on an unfeasible path, the State has a contradiction.

We think that getRange(Sym("a != b") should return a set that does not contain 0.
https://github.com/llvm/llvm-project/blob/e63b488f2755f91e8147fd678ed525cf6ba007cc/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp#L2064
Currently that goes down to infer(QualType("int")) which results a RangeSet[INT_MIN, INT_MAX].
Stach trace attached.

// RUN: %clang_analyze_cc1 %s \
// RUN:   -analyzer-checker=core \
// RUN:   -analyzer-checker=debug.ExprInspection \
// RUN:   -triple x86_64-unknown-linux \
// RUN:   -verify

// expected-no-diagnostics

void f(int a, int b) {
  int c = b - a;
  if (c <= 0)
    return;
  // c > 0
  // b - a > 0
  // b > a
  if (a != b)
    return;
  // a == b
  // This is an unfeasible path, the State has a contradiction.
  if (c < 0) // crash here
    ;
}
#0  (anonymous namespace)::SymbolicRangeInferrer::inferRange<clang::ento::SymExpr const*> (BV=..., F=..., State=..., Origin=0x55b9979bed08) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:715
#1  0x00007fa2314dddc9 in (anonymous namespace)::RangeConstraintManager::getRange (this=0x55b99791ab10, State=..., Sym=0x55b9979bed08) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2012
#2  0x00007fa2314de1e9 in (anonymous namespace)::RangeConstraintManager::assumeSymEQ (this=0x55b99791ab10, St=..., Sym=0x55b9979bed08, Int=..., Adjustment=...) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2064
#3  0x00007fa23150470d in clang::ento::RangedConstraintManager::assumeSymUnsupported (this=0x55b99791ab10, State=..., Sym=0x55b9979bed08, Assumption=false) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp:136
#4  0x00007fa2315353a9 in clang::ento::SimpleConstraintManager::assumeAux (this=0x55b99791ab10, State=..., Cond=..., Assumption=false) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/SimpleConstraintManager.cpp:62
#5  0x00007fa23153518b in clang::ento::SimpleConstraintManager::assume (this=0x55b99791ab10, State=..., Cond=..., Assumption=false) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/SimpleConstraintManager.cpp:46
#6  0x00007fa2315350e5 in clang::ento::SimpleConstraintManager::assume (this=0x55b99791ab10, State=..., Cond=..., Assumption=false) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/SimpleConstraintManager.cpp:41
#7  0x00007fa2313d39b3 in clang::ento::ConstraintManager::assumeDual (this=0x55b99791ab10, State=..., Cond=...) at ../../git/llvm-project/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h:105
#8  0x00007fa2313d3bfc in clang::ento::ProgramState::assume (this=0x55b9979beef8, Cond=...) at ../../git/llvm-project/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h:681
#9  0x00007fa231436b8a in clang::ento::ExprEngine::evalEagerlyAssumeBinOpBifurcation (this=0x7fffdf9ce9d0, Dst=..., Src=..., Ex=0x55b9979893f0) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:3058

Hi Valery,

Together with @steakhal we have found a serious issue.
The below code crashes if you compile with -DEXPENSIVE_CHECKS.
The analyzer goes on an unfeasible path, the State has a contradiction.

We think that getRange(Sym("a != b") should return a set that does not contain 0.
https://github.com/llvm/llvm-project/blob/e63b488f2755f91e8147fd678ed525cf6ba007cc/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp#L2064
Currently that goes down to infer(QualType("int")) which results a RangeSet[INT_MIN, INT_MAX].
Stach trace attached.

// RUN: %clang_analyze_cc1 %s \
// RUN:   -analyzer-checker=core \
// RUN:   -analyzer-checker=debug.ExprInspection \
// RUN:   -triple x86_64-unknown-linux \
// RUN:   -verify

// expected-no-diagnostics

void f(int a, int b) {
  int c = b - a;
  if (c <= 0)
    return;
  // c > 0
  // b - a > 0
  // b > a
  if (a != b)
    return;
  // a == b
  // This is an unfeasible path, the State has a contradiction.
  if (c < 0) // crash here
    ;
}
#0  (anonymous namespace)::SymbolicRangeInferrer::inferRange<clang::ento::SymExpr const*> (BV=..., F=..., State=..., Origin=0x55b9979bed08) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:715
#1  0x00007fa2314dddc9 in (anonymous namespace)::RangeConstraintManager::getRange (this=0x55b99791ab10, State=..., Sym=0x55b9979bed08) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2012
#2  0x00007fa2314de1e9 in (anonymous namespace)::RangeConstraintManager::assumeSymEQ (this=0x55b99791ab10, St=..., Sym=0x55b9979bed08, Int=..., Adjustment=...) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:2064
#3  0x00007fa23150470d in clang::ento::RangedConstraintManager::assumeSymUnsupported (this=0x55b99791ab10, State=..., Sym=0x55b9979bed08, Assumption=false) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/RangedConstraintManager.cpp:136
#4  0x00007fa2315353a9 in clang::ento::SimpleConstraintManager::assumeAux (this=0x55b99791ab10, State=..., Cond=..., Assumption=false) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/SimpleConstraintManager.cpp:62
#5  0x00007fa23153518b in clang::ento::SimpleConstraintManager::assume (this=0x55b99791ab10, State=..., Cond=..., Assumption=false) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/SimpleConstraintManager.cpp:46
#6  0x00007fa2315350e5 in clang::ento::SimpleConstraintManager::assume (this=0x55b99791ab10, State=..., Cond=..., Assumption=false) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/SimpleConstraintManager.cpp:41
#7  0x00007fa2313d39b3 in clang::ento::ConstraintManager::assumeDual (this=0x55b99791ab10, State=..., Cond=...) at ../../git/llvm-project/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ConstraintManager.h:105
#8  0x00007fa2313d3bfc in clang::ento::ProgramState::assume (this=0x55b9979beef8, Cond=...) at ../../git/llvm-project/clang/include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h:681
#9  0x00007fa231436b8a in clang::ento::ExprEngine::evalEagerlyAssumeBinOpBifurcation (this=0x7fffdf9ce9d0, Dst=..., Src=..., Ex=0x55b9979893f0) at ../../git/llvm-project/clang/lib/StaticAnalyzer/Core/ExprEngine.cpp:3058

Hi Gabor,

I'll take a look at this problem ASAP!

Thanks for such a thorough analysis of the issue!

We came up with a quick fix, let us know what you think:
https://reviews.llvm.org/D88019