If range [m .. n] is stored for symbolic expression A - B, then we can deduce the range for B - A which is [-n .. -m]. This is only true for signed types, unless the range is [0 .. 0].
Details
Diff Detail
- Repository
- rC Clang
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
476 | Could you give an example why do you need this (probably as a test), or constrain the transformation when all the types are the same? | |
483 | I think it would be better to describe why don't we want to do that rather than describing what the code does. |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
472 | With this, it sounds as if we're half-way into finally supporting the unary minus operator (: Could you add a FIXME here: "Once SValBuilder supports unary minus, we should use SValBuilder to obtain the negated symbolic expression instead of constructing the symbol manually. This will allow us to support finding ranges of not only negated SymSymExpr-type expressions, but also of other, simpler expressions which we currently do not know how to negate." | |
475โ480 | I'm quite sure that types of A - B and B - A are always equal when it comes to integer promotion rules. Also, due to our broken SymbolCast (which is often missing where it ideally should be), the type of the A - B symbol may not necessarily be the same as the type that you obtain by applying integer promotion rules to types of A and B. So i think you should always take the type of A - B and expect to find B - A of the same type in the range map, otherwise give up. | |
487 | Pointer types are currently treated as unsigned, so i'm not sure you want them here. | |
test/Analysis/ptr-arith.c | ||
267โ268 | The tests that are now supported should be moved above this comment. |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
487 | For me it seems that pointer differences are still pointer types and they are signed. (The range becomes negative upon negative assumption. From test ptr-arith.c: void use_symbols(int *lhs, int *rhs) { clang_analyzer_eval(lhs < rhs); // expected-warning{{UNKNOWN}} if (lhs < rhs) return; clang_analyzer_eval(lhs < rhs); // expected-warning{{FALSE}} clang_analyzer_eval(lhs - rhs); // expected-warning{{UNKNOWN}} if ((lhs - rhs) != 5) return; clang_analyzer_eval((lhs - rhs) == 5); // expected-warning{{TRUE}} } If I put clang_analyzer_printState() into the empty line in the middle, I get the following range for the difference: [-9223372036854775808, 0]. If I replace int* with unsigned, this range becomes [0, 0], so int* is handled as a signed type here. |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
487 | Umm, yeah, i was wrong. *looks closer* T is the type of the difference, right? I don't think i'd expect pointer type as the type of the difference. Could you add test cases for pointers if you intend to support them (and maybe for unsigned types)? |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
487 | I do not know exactly the type, but if I remove the T->isPointerType() condition the test in ptr_arith.c will fail with UNKNOWN. So the type of the difference is a type that returns true from T->isPointerType(). Pointer tests are already there in ptr_arith.c. Should I duplicate them? |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
487 | I don't see any failing tests when i remove T->isPointerType(). Also this shouldn't be system-specific, because the target triple is hardcoded in ptr-arith.c runline. Could you point out which test is failing and dump the type in question (-ast-dump, or Type->dump(), or llvm::errs() << QualType::getAsString(), or whatever)? |
I think I checked the type of the left side of the difference, not the difference itself. Thus the difference is not a pointer type, it is a signed integer type, the tests pass when I remove that line.
Is this blocked on the same reasons as what was raised in https://reviews.llvm.org/D35109?
Strange, but modifying the tests from m <relation> n to m - n <relation> 0 does not help. The statement if (m - n <relation> 0) does not store a range for m - n in the constraint manager. With the other patch which automatically changes m <relation> n to m - n <relation> 0 the range is stored automatically.
I guess we can easily assume how a SymIntExpr relates to a number by storing a range on the opaque left-hand-side symbol, no matter how complicated it is, but we cannot assume how a symbol relates to a symbol (there's no obvious range to store). That's just how assumeSym currently works.
Actually it happens because m - n evaluates to Unknown. The code part responsible for this is the beginning of SValBuilder::makeSymExprValNN(), which prevents m - n-like symbolic expression unless one of m or n is Tainted. Anna added this part 5-6 years ago because some kind of bug, but it seems that it still exists. If I try to remove it then one test executes for days (with loop max count 63 or 64) and two tests fail with an assert.
LGTM! Minor nitpicking in comments.
Currently there's no such problem, but as getRange becomes more complicated, we'll really miss the possibility of saying something like "if there's a range for negated symbol, return getRange(the negated symbol)", so that all other special cases applied. We could have allowed that by canonicalizing symbols (i.e. announce that $A always goes before $B and therefore we will store a range for $A - $B even if we're told to store the range for $B - $A) and then the "if" will become "if the symbol is not canonical, return getRange(the canonicalized symbol)".
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
472 | Can we move the whole if into a function? Eg., if (RangeSet *R = getRangeForMinusSymbol(Sym)) return R->Negate(BV, F) ? | |
483 | ConstraintRangeTy::data_type ~> RangeSet should be easier to read. |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
183โ194 | Hmm, wait a minute, is this actually correct? For the range [-2ยณยน, -2ยณยน + 1] over a 32-bit integer, the negated range will be [-2ยณยน, -2ยณยน] U [2ยณยน - 1, 2ยณยน - 1]. So there must be a place in the code where we take one range and add two ranges. |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
183โ194 | The two ends of the range of the type usually stands for +/- infinity. If we add the minimum of the type when negating a negative range, then we lose the whole point of this transformation. Example: If A - B < 0, then the range of A - B is [-2ยณยน, -1], If we negate this, and keep the -2ยณยน range end, then we get [-2ยณยน, -2ยณยน]U[1, 2ยณยน-1]. However, this does not mean B - A > 0. If we make assumption about this, we get two states instead of one, in the true state A - B's range is [1, 2ยณยน-1] and in the false state it is [-2ยณยน, -2ยณยน]. This is surely not what we want. |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
183โ194 | Well, we can't turn math into something we want, it is what it is. Iterator-related symbols are all planned to be within range [-2ยฒโน, -2ยฒโน], right? So if we subtract one such symbol from another, it's going to be in range [-2ยณโฐ, 2ยณโฐ]. Can we currently infer that? Or maybe we should make the iterator checker to enforce that separately? Because this range doesn't include -2ยณยน, so we won't have any problems with doing negation correctly. So as usual i propose to get this code mathematically correct and then see if we can ensure correct behavior by enforcing reasonable constraints on our symbols. |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
183โ194 | I agree that the code should do mathematically correct things, but what I argue here is what math here means. Computer science is based on math, but it is somewhat different because of finite ranges and overflows. So I initially regarded the minimal and maximal values as infinity. Maybe that is not correct. However, I am sure that negating -2ยณยน should never be -2ยณยน. This is mathematically incorrect, and renders the whole calculation useless, since the union of a positive range and a negative range is unsuitable for any reasoning. I see two options here:
Iterator checkers are of course not affected because of the max/4 constraints. |
Can we continue the discussion here, please? We should involve Devin and/or George as well if we cannot agree ourselves.
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
183โ194 |
Well, that's how computers already work. And that's how all sorts of abstract algebra work as well, so this is totally mathematically correct. We promise to support the two's complement semantics in the analyzer when it comes to signed integer overflows. Even though it's technically UB, most implementations follow this semantics and a lot of real-world applications explicitly rely on that. Also we cannot simply drop values from our constraint ranges in the general case because the values we drop might be the only valid values, and the assumption that at least one non-dropped value can definitely be taken is generally incorrect. Finding cornercases like this one is one of the strong sides of any static analysis: it is in fact our job to make the user aware of it if he doesn't understand overflow rules. If it cannot be said that the variable on a certain path is non-negative because it might as well be -2ยณยน, we should totally explore this possibility. If for a certain checker it brings no benefit because such value would be unlikely in certain circumstances, that checker is free to cut off the respective paths, but the core should perform operations precisely. I don't think we have much room for personal preferences here. |
I still disagree, but I want the review to continue so I did the requested modifications.
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
191 | We'll also need to merge the two adjacent segments if the original range had both a [MinSingedValue, MinSignedValue] and a [X, MaxSignedValue]: Because our immutable sets are sorted, you can conduct the check for the first and the last segment separately. I think this code needs comments because even though it's short it's pretty hard to get right. | |
192 | Return value of add seems to be accidentally discarded here. I guess i'll look into adding an __attribute__((warn_unused_result)) to these functions, because it's a super common bug. |
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
192 | Also tests would have saved us. |
Sorry, Artem, but it does not work this way. Even if the symbolic expressions are constrained to [-MAX/4..MAX/4], after rearrangement the difference still uses the whole range, thus m>n becomes m-n>0, where in the false branch the range for m-n is [MIN..0]. Then if we check n>=m the range set is reversed to [MIN..MIN]U[0..MAX] which results UNKNOWN for n-m. It does not solve any of our problems and there is no remedy on the checker's side.
Maybe if we could apply somehow a [-MAX/2..MAX/2] constraint to both sides of the rearranged equality in SimpleSValBuilder.
Which expressions are constrained? Why does the difference use the whole range? Is it something that could have been fixed by the "enforce that separately" part in my old comment:
iterator-related symbols are all planned to be within range [-2ยฒโน, -2ยฒโน], right? So if we subtract one such symbol from another, it's going to be in range [-2ยณโฐ, 2ยณโฐ]. Can we currently infer that? Or maybe we should make the iterator checker to enforce that separately?
?
RangedConstraintManager currently does not support Sym+Sym-type of expressions, only Sym+Int-type ones. That is why it cannot calculate that the result is within [-2ยณโฐ, 2ยณโฐ]. In the iterator checkers we do not know anything about the rearranged expressions, it has no access to the sum/difference, the whole purpose of your proposal was to put in into the infrastructure. The checker enforces everything it can but it does not help.
In the iterator checkers we do not know anything about the rearranged expressions, it has no access to the sum/difference, the whole purpose of your proposal was to put in into the infrastructure.
It wasn't. The purpose was merely to move (de-duplicate) the code that computes the sum/difference away from the checker. The checker can still operate on the result of such calculation if it knows something about that result that the core doesn't.
I still don't think i fully understand your concern. Could you provide an example and point out what exactly goes wrong?
I added extra assertion into the test for the difference. Interestingly, it also works if I assert n-m is in the range instead of m-n. However, I wonder how can I apply such constraint to the difference of iterator positions without decomposing them to symbols and constants.
Ok, code makes sense to me now!
I think we still need a few new tests to cover the corner cases.
I don't see how iterator checker is different from the tests. The code of the program in your tests doesn't decompose m - n into symbols and constants, it simply subtracts an opaque value n (whatever it is) from an opaque value m (whatever it is) and makes assumptions on the opaque result of the subtraction (whatever it is). The checker should do the same, i guess?
lib/StaticAnalyzer/Core/RangeConstraintManager.cpp | ||
---|---|---|
177โ178 | I guess the comment needs to be updated. | |
189 | I suggest a few sanity checks here (untested): |
Thank you!! Please commit.
test/Analysis/constraint_manager_negate_difference.c | ||
---|---|---|
96 | Whitespace (: |