[Analyzer] Constraint Manager Negates Difference
Needs ReviewPublic

Authored by baloghadamsoftware on Jul 7 2017, 1:08 AM.

Details

Summary

If range [m .. n] is stored for symbolic expression A - B, then we can deduce the range for B - A which is [-n .. -m]. This is only true for signed types, unless the range is [0 .. 0].

Diff Detail

baloghadamsoftware created this revision.

Wrong patch files was uploaded first.

xazax.hun added inline comments.Jul 11 2017, 2:36 AM
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
472

Could you give an example why do you need this (probably as a test), or constrain the transformation when all the types are the same?

479

I think it would be better to describe why don't we want to do that rather than describing what the code does.

NoQ added inline comments.Jul 12 2017, 12:59 AM
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
468

With this, it sounds as if we're half-way into finally supporting the unary minus operator (:

Could you add a FIXME here: "Once SValBuilder supports unary minus, we should use SValBuilder to obtain the negated symbolic expression instead of constructing the symbol manually. This will allow us to support finding ranges of not only negated SymSymExpr-type expressions, but also of other, simpler expressions which we currently do not know how to negate."

471–476

I'm quite sure that types of A - B and B - A are always equal when it comes to integer promotion rules.

Also, due to our broken SymbolCast (which is often missing where it ideally should be), the type of the A - B symbol may not necessarily be the same as the type that you obtain by applying integer promotion rules to types of A and B.

So i think you should always take the type of A - B and expect to find B - A of the same type in the range map, otherwise give up.

483

Pointer types are currently treated as unsigned, so i'm not sure you want them here.

test/Analysis/ptr-arith.c
267–268

The tests that are now supported should be moved above this comment.

lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
483

For me it seems that pointer differences are still pointer types and they are signed. (The range becomes negative upon negative assumption. From test ptr-arith.c:

void use_symbols(int *lhs, int *rhs) {
  clang_analyzer_eval(lhs < rhs); // expected-warning{{UNKNOWN}}
  if (lhs < rhs)
    return;
  clang_analyzer_eval(lhs < rhs); // expected-warning{{FALSE}}

  clang_analyzer_eval(lhs - rhs); // expected-warning{{UNKNOWN}}
  if ((lhs - rhs) != 5)
    return;
  clang_analyzer_eval((lhs - rhs) == 5); // expected-warning{{TRUE}}
}

If I put clang_analyzer_printState() into the empty line in the middle, I get the following range for the difference: [-9223372036854775808, 0]. If I replace int* with unsigned, this range becomes [0, 0], so int* is handled as a signed type here.

Type selection simplified, FIXME added.

NoQ added inline comments.Jul 17 2017, 8:03 AM
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
483

Umm, yeah, i was wrong.

*looks closer*

T is the type of the difference, right? I don't think i'd expect pointer type as the type of the difference.

Could you add test cases for pointers if you intend to support them (and maybe for unsigned types)?

lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
483

I do not know exactly the type, but if I remove the T->isPointerType() condition the test in ptr_arith.c will fail with UNKNOWN. So the type of the difference is a type that returns true from T->isPointerType().

Pointer tests are already there in ptr_arith.c. Should I duplicate them?

NoQ added inline comments.Jul 18 2017, 4:25 AM
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
483

I don't see any failing tests when i remove T->isPointerType().

Also this shouldn't be system-specific, because the target triple is hardcoded in ptr-arith.c runline.

Could you point out which test is failing and dump the type in question (-ast-dump, or Type->dump(), or llvm::errs() << QualType::getAsString(), or whatever)?

I think I checked the type of the left side of the difference, not the difference itself. Thus the difference is not a pointer type, it is a signed integer type, the tests pass when I remove that line.

Is this blocked on the same reasons as what was raised in https://reviews.llvm.org/D35109?

Is this blocked on the same reasons as what was raised in https://reviews.llvm.org/D35109?

No, it is blocked because D35109 is a prerequisite.

This one is not blocked anymore since I removed the dependency.

This one is not blocked anymore since I removed the dependency.

But I have to modify the test cases...

Strange, but modifying the tests from m <relation> n to m - n <relation> 0 does not help. The statement if (m - n <relation> 0) does not store a range for m - n in the constraint manager. With the other patch which automatically changes m <relation> n to m - n <relation> 0 the range is stored automatically.

NoQ added a comment.Jan 10 2018, 10:45 AM

Strange, but modifying the tests from m <relation> n to m - n <relation> 0 does not help. The statement if (m - n <relation> 0) does not store a range for m - n in the constraint manager. With the other patch which automatically changes m <relation> n to m - n <relation> 0 the range is stored automatically.

I guess we can easily assume how a SymIntExpr relates to a number by storing a range on the opaque left-hand-side symbol, no matter how complicated it is, but we cannot assume how a symbol relates to a symbol (there's no obvious range to store). That's just how assumeSym currently works.

In D35110#972430, @NoQ wrote:

Strange, but modifying the tests from m <relation> n to m - n <relation> 0 does not help. The statement if (m - n <relation> 0) does not store a range for m - n in the constraint manager. With the other patch which automatically changes m <relation> n to m - n <relation> 0 the range is stored automatically.

I guess we can easily assume how a SymIntExpr relates to a number by storing a range on the opaque left-hand-side symbol, no matter how complicated it is, but we cannot assume how a symbol relates to a symbol (there's no obvious range to store). That's just how assumeSym currently works.

Actually it happens because m - n evaluates to Unknown. The code part responsible for this is the beginning of SValBuilder::makeSymExprValNN(), which prevents m - n-like symbolic expression unless one of m or n is Tainted. Anna added this part 5-6 years ago because some kind of bug, but it seems that it still exists. If I try to remove it then one test executes for days (with loop max count 63 or 64) and two tests fail with an assert.

Updated to be based upon D41938.

Rebased to the newly committed SValbuilder rearranger patch

No more pending dependency, so we can continue the review.

NoQ added a comment.EditedApr 27 2018, 8:11 PM

LGTM! Minor nitpicking in comments.

Currently there's no such problem, but as getRange becomes more complicated, we'll really miss the possibility of saying something like "if there's a range for negated symbol, return getRange(the negated symbol)", so that all other special cases applied. We could have allowed that by canonicalizing symbols (i.e. announce that $A always goes before $B and therefore we will store a range for $A - $B even if we're told to store the range for $B - $A) and then the "if" will become "if the symbol is not canonical, return getRange(the canonicalized symbol)".

lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
468

Can we move the whole if into a function?

Eg.,

if (RangeSet *R = getRangeForMinusSymbol(Sym))
  return R->Negate(BV, F)

?

479

ConstraintRangeTy::data_type ~> RangeSet should be easier to read.

NoQ added inline comments.Apr 27 2018, 8:29 PM
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
183–194

Hmm, wait a minute, is this actually correct?

For the range [-2³¹, -2³¹ + 1] over a 32-bit integer, the negated range will be [-2³¹, -2³¹] U [2³¹ - 1, 2³¹ - 1].

So there must be a place in the code where we take one range and add two ranges.

lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
183–194

The two ends of the range of the type usually stands for +/- infinity. If we add the minimum of the type when negating a negative range, then we lose the whole point of this transformation.

Example: If A - B < 0, then the range of A - B is [-2³¹, -1], If we negate this, and keep the -2³¹ range end, then we get [-2³¹, -2³¹]U[1, 2³¹-1]. However, this does not mean B - A > 0. If we make assumption about this, we get two states instead of one, in the true state A - B's range is [1, 2³¹-1] and in the false state it is [-2³¹, -2³¹]. This is surely not what we want.

NoQ added inline comments.May 2 2018, 2:39 PM
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
183–194

Well, we can't turn math into something we want, it is what it is.

Iterator-related symbols are all planned to be within range [-2²⁹, -2²⁹], right? So if we subtract one such symbol from another, it's going to be in range [-2³⁰, 2³⁰]. Can we currently infer that? Or maybe we should make the iterator checker to enforce that separately? Because this range doesn't include -2³¹, so we won't have any problems with doing negation correctly.

So as usual i propose to get this code mathematically correct and then see if we can ensure correct behavior by enforcing reasonable constraints on our symbols.

lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
183–194

I agree that the code should do mathematically correct things, but what I argue here is what math here means. Computer science is based on math, but it is somewhat different because of finite ranges and overflows. So I initially regarded the minimal and maximal values as infinity. Maybe that is not correct. However, I am sure that negating -2³¹ should never be -2³¹. This is mathematically incorrect, and renders the whole calculation useless, since the union of a positive range and a negative range is unsuitable for any reasoning. I see two options here:

  1. Remove the extension when negating a range which ends at the maximal value of the type. So the negated range begins at the minimal value plus one. However, cut the range which begins at the minimal value of the type by one. So the negated range ends at the maximal value, as in the current version in the patch.
  1. Remove the extension as in 1. and disable the whole negation if we have the range begins at the minimal value.

Iterator checkers are of course not affected because of the max/4 constraints.

Fixed according to the comments.

baloghadamsoftware marked 2 inline comments as done.May 4 2018, 6:53 AM

I chose option 1 for now.

Can we continue the discussion here, please? We should involve Devin and/or George as well if we cannot agree ourselves.

NoQ added inline comments.Fri, May 25, 2:53 PM
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
183–194

However, I am sure that negating -2³¹ should never be -2³¹. This is mathematically incorrect, and renders the whole calculation useless, since the union of a positive range and a negative range is unsuitable for any reasoning.

Well, that's how computers already work. And that's how all sorts of abstract algebra work as well, so this is totally mathematically correct. We promise to support the two's complement semantics in the analyzer when it comes to signed integer overflows. Even though it's technically UB, most implementations follow this semantics and a lot of real-world applications explicitly rely on that. Also we cannot simply drop values from our constraint ranges in the general case because the values we drop might be the only valid values, and the assumption that at least one non-dropped value can definitely be taken is generally incorrect. Finding cornercases like this one is one of the strong sides of any static analysis: it is in fact our job to make the user aware of it if he doesn't understand overflow rules. If it cannot be said that the variable on a certain path is non-negative because it might as well be -2³¹, we should totally explore this possibility. If for a certain checker it brings no benefit because such value would be unlikely in certain circumstances, that checker is free to cut off the respective paths, but the core should perform operations precisely. I don't think we have much room for personal preferences here.

I still disagree, but I want the review to continue so I did the requested modifications.

NoQ added inline comments.Tue, May 29, 11:16 AM
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
191

We'll also need to merge the two adjacent segments if the original range had both a [MinSingedValue, MinSignedValue] and a [X, MaxSignedValue]:

Because our immutable sets are sorted, you can conduct the check for the first and the last segment separately.

I think this code needs comments because even though it's short it's pretty hard to get right.

192

Return value of add seems to be accidentally discarded here.

I guess i'll look into adding an __attribute__((warn_unused_result)) to these functions, because it's a super common bug.

NoQ added inline comments.Tue, May 29, 11:17 AM
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
192

Also tests would have saved us.

george.karpenkov resigned from this revision.Wed, May 30, 4:46 PM

Sorry, Artem, but it does not work this way. Even if the symbolic expressions are constrained to [-MAX/4..MAX/4], after rearrangement the difference still uses the whole range, thus m>n becomes m-n>0, where in the false branch the range for m-n is [MIN..0]. Then if we check n>=m the range set is reversed to [MIN..MIN]U[0..MAX] which results UNKNOWN for n-m. It does not solve any of our problems and there is no remedy on the checker's side.

Maybe if we could apply somehow a [-MAX/2..MAX/2] constraint to both sides of the rearranged equality in SimpleSValBuilder.

NoQ added a comment.Fri, Jun 1, 1:11 PM

Sorry, Artem, but it does not work this way. Even if the symbolic expressions are constrained to [-MAX/4..MAX/4], after rearrangement the difference still uses the whole range, thus m>n becomes m-n>0, where in the false branch the range for m-n is [MIN..0]. Then if we check n>=m the range set is reversed to [MIN..MIN]U[0..MAX] which results UNKNOWN for n-m. It does not solve any of our problems and there is no remedy on the checker's side.

Which expressions are constrained? Why does the difference use the whole range? Is it something that could have been fixed by the "enforce that separately" part in my old comment:

iterator-related symbols are all planned to be within range [-2²⁹, -2²⁹], right? So if we subtract one such symbol from another, it's going to be in range [-2³⁰, 2³⁰]. Can we currently infer that? Or maybe we should make the iterator checker to enforce that separately?

?

In D35110#1119496, @NoQ wrote:

Which expressions are constrained? Why does the difference use the whole range? Is it something that could have been fixed by the "enforce that separately" part in my old comment:

iterator-related symbols are all planned to be within range [-2²⁹, -2²⁹], right? So if we subtract one such symbol from another, it's going to be in range [-2³⁰, 2³⁰]. Can we currently infer that? Or maybe we should make the iterator checker to enforce that separately?

?

RangedConstraintManager currently does not support Sym+Sym-type of expressions, only Sym+Int-type ones. That is why it cannot calculate that the result is within [-2³⁰, 2³⁰]. In the iterator checkers we do not know anything about the rearranged expressions, it has no access to the sum/difference, the whole purpose of your proposal was to put in into the infrastructure. The checker enforces everything it can but it does not help.

Any idea how to proceed?

NoQ added a comment.Fri, Jun 15, 2:05 PM

In the iterator checkers we do not know anything about the rearranged expressions, it has no access to the sum/difference, the whole purpose of your proposal was to put in into the infrastructure.

It wasn't. The purpose was merely to move (de-duplicate) the code that computes the sum/difference away from the checker. The checker can still operate on the result of such calculation if it knows something about that result that the core doesn't.

NoQ added a comment.EditedFri, Jun 15, 2:08 PM

I still don't think i fully understand your concern. Could you provide an example and point out what exactly goes wrong?

baloghadamsoftware updated this revision to Diff 151720.EditedMon, Jun 18, 8:46 AM

-(-2³¹) == -2³¹

baloghadamsoftware added a comment.EditedMon, Jun 18, 8:51 AM

I added extra assertion into the test for the difference. Interestingly, it also works if I assert n-m is in the range instead of m-n. However, I wonder how can I apply such constraint to the difference of iterator positions without decomposing them to symbols and constants.

I tested all parts of the Iterator Checkers, all tests passed.