This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][solver] Fix issue with symbol non-equality tracking
ClosedPublic

Authored by martong on Sep 21 2020, 4:45 AM.

Details

Summary

We should track non-equivalency (disequality) in case of greater-then or
less-then assumptions.

Diff Detail

Event Timeline

martong created this revision.Sep 21 2020, 4:45 AM
martong requested review of this revision.Sep 21 2020, 4:45 AM

@steakhal, thank you for your time and huge effort in debugging this!

I came up with exactly the same fix! Great job!
I just wanted to refactor it and not having

if (New.isEmpty())
  // this is infeasible assumption
  return nullptr;

ProgramStateRef NewState = setConstraint(St, Sym, New);
return trackNE(NewState, Sym, Int, Adjustment);

repeated in different places

After this "accepted" and a huge thank you 😄

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1349–1379

I suggest to change these two functions this way, so we can avoid the same pattern in 4 different functions

martong updated this revision to Diff 293168.Sep 21 2020, 7:33 AM
martong marked an inline comment as done.
  • Avoid same pattern when checking whether the assumption is infeasible

Thanks for the quick review!
I updated according to your suggestion, so we avoid the same pattern.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1349–1379

I suggest to change these two functions this way, so we can avoid the same pattern in 4 different functions

vsavchenko accepted this revision.Sep 21 2020, 7:35 AM

Amazing!

This revision is now accepted and ready to land.Sep 21 2020, 7:35 AM
This revision was landed with ongoing or failed builds.Sep 21 2020, 8:00 AM
This revision was automatically updated to reflect the committed changes.

What are our options mitigating anything similar happening in the future?

This way any change touching the SymbolicRangeInferrer and any related parts of the analyzer seems to be way too fragile.
Especially, since we might want to add support for comparing SymSyms, just like we try to do in D77792.

What are our options mitigating anything similar happening in the future?

This way any change touching the SymbolicRangeInferrer and any related parts of the analyzer seems to be way too fragile.
Especially, since we might want to add support for comparing SymSyms, just like we try to do in D77792.

What about changing the EXPENSIVE_CHECKS in the assume function in the following way:
Convert all range constraints into a Z3 model and check if that is UNSAT.
In that case, we would have returned a state with contradictions, so we would prevent this particular bug from lurking around to bite us later.

And another possibility could be to create a debug checker, which registers to the assume callback and does the same conversion and check.
This is more appealing to me in some way, like decouples the Z3 dependency from the ConstraintManager header.

Which approach should I prefer? @NoQ @vsavchenko @martong @xazax.hun @Szelethus

What are our options mitigating anything similar happening in the future?

This way any change touching the SymbolicRangeInferrer and any related parts of the analyzer seems to be way too fragile.
Especially, since we might want to add support for comparing SymSyms, just like we try to do in D77792.

What about changing the EXPENSIVE_CHECKS in the assume function in the following way:
Convert all range constraints into a Z3 model and check if that is UNSAT.
In that case, we would have returned a state with contradictions, so we would prevent this particular bug from lurking around to bite us later.

And another possibility could be to create a debug checker, which registers to the assume callback and does the same conversion and check.
This is more appealing to me in some way, like decouples the Z3 dependency from the ConstraintManager header.

Which approach should I prefer? @NoQ @vsavchenko @martong @xazax.hun @Szelethus

I like the second approach, i.e. to have a debug checker. But I don't see, how would this checker validate all constraints at the moment when they are added to the State. And if we don't check all constraints then we might end up checking a state that is invalid for a while (i.e. that might became invalid previously and not because of the lastly added constraint.) So, essentially, my gut feeling is that both approaches should be validating all newly added constraints against Z3. And that might be too slow, it would have the same speed as using z3 instead of the range based solver.

I like the second approach, i.e. to have a debug checker. But I don't see, how would this checker validate all constraints at the moment when they are added to the State.

ProgramStateRef evalAssume(ProgramStateRef State, SVal Cond, bool Assumption) checker callback is called after any assume call.
So, we would have a State which has all? the constraints stored in the appropriate GDM. I'm not sure if we should aggregate all the constraints till this node - like we do for refutation. I think, in this case, we should just make sure that the current state does not have any contradiction.


Here is my proof-of-concept:

I've dumped the state and args of evalAssume for the repro example, pretending that we don't reach a code-path on the infeasable path to crash:

void avoidInfeasibleConstraintforGT(int a, int b) {
  int c = b - a;
  if (c <= 0)
    return;
  if (a != b) {
    clang_analyzer_warnIfReached(); // expected-warning{{REACHABLE}}
    return;
  }
  clang_analyzer_warnIfReached(); // eh, we will reach this..., #1
  // These are commented out, to pretend that we don't find out that we are on an infeasable path...
  // a == b
  // if (c < 0)
  //   ;
}

We reach the #1 line, but more importantly, we will have the following state dumps:

evalAssume: assuming Cond: (reg_$1<int a>) != (reg_$0<int b>) to be true in state:
"program_state": {
  "constraints": [
    { "symbol": "(reg_$0<int b>) - (reg_$1<int a>)", "range": "{ [1, 2147483647] }" },
    { "symbol": "(reg_$1<int a>) != (reg_$0<int b>)", "range": "{ [-2147483648, -1], [1, 2147483647] }" }
  ]
}

evalAssume: assuming Cond: (reg_$1<int a>) != (reg_$0<int b>) to be false in state:
"program_state": {
  "constraints": [
    { "symbol": "(reg_$0<int b>) - (reg_$1<int a>)", "range": "{ [1, 2147483647] }" },
    { "symbol": "(reg_$1<int a>) != (reg_$0<int b>)", "range": "{ [0, 0] }" }
  ]
}

As you can see, the latter is the infeasable path, and if we have had serialized the constraints and let the Z3 check it, we would have gotten that it's unfeasable.
At this point the checker can dump any useful data for debugging, and crash the analyzer to let us know that something really bad happened.

[...] And that might be too slow, it would have the same speed as using z3 instead of the range based solver.

Yes, it will probably freaking slow, but at least we have something.