This is an archive of the discontinued LLVM Phabricator instance.

clang/test/Analysis/constant-folding.c
305	I think this is always true.
317	I'm not sure I understand why do you make this assumption.
320–321	Here is the same note, `clang_analyzer_eval` doesn't produce `TRUE` only because this condition might be true.
328	If `c == d == 1` this doesn't hold. Did you want to check here that we account for overflows?
332	What about other cases here? I mean specifically when two ranges are have a form `[A, B]`
335	`clang_analyzer_eval` returns `TRUE` only if we can deduct that this expression on a particular path is always true. Here it's not going to hold.

ChuanqiXu added a subscriber: ChuanqiXu.Jun 1 2021, 4:13 AM

xazax.hun added inline comments.Jun 1 2021, 8:45 PM

clang/test/Analysis/constant-folding.c
295	Since both `a` and `b` are unsigned values, their sum is also unsigned. We basically check if their sum could be zero. I don't see why it could not in this case (e.g. when `b == UINT_MAX - a + 1`). Do I miss something?
296	If `a == INT_MAX + 1 && b == 0` then `(a + b) > 0`. Do I miss something? Or is this a bug?
305	So is `a >= 0 && b >= 0` unless I miss something.
312	I think `UINT_MAX + UINT_MAX` supposed to be positive. Do I miss something here?

Our pre-commit testing shows that the new tests fail, eg.:

https://buildkite.com/llvm-project/premerge-checks/builds/41391#b557c86c-0587-4aee-a06b-8a4de6d771c4

Failed Tests (1):
  Clang :: Analysis/constant-folding.c

Did you figure out how to *run* the tests locally? Tests for the static analyzer are in the check-clang-analysis target.

@NoQ I figured out the tests but while testing against Z3, I mixed up constraints. I am changing those.

clang/test/Analysis/constant-folding.c
295	I am editing that condition to be `b >= INT_MAX`.
305	@vsavchenko true, I am removing that conditional as it serves no purpose there. @xazax.hun that's true.
312	Should not `UINT_MAX + UINT_MAX` wrap around to be negative? I think I misunderstood it.
320–321	understood.
328	Yes, I was checking for that. I am changing that case.
332	I am adding more cases here.

manas added inline comments.Jun 1 2021, 11:12 PM

clang/test/Analysis/constant-folding.c
312	Understood. My bad.

In D103440#2792814, @NoQ wrote:
Our pre-commit testing shows that the new tests fail, eg.:

https://buildkite.com/llvm-project/premerge-checks/builds/41391#b557c86c-0587-4aee-a06b-8a4de6d771c4
Failed Tests (1):
  Clang :: Analysis/constant-folding.c
Did you figure out how to *run* the tests locally? Tests for the static analyzer are in the check-clang-analysis target.

They are actually supposed to fail, we first decide on the scope of what @manas will support and implement the actual feature later.

Fixed test cases expecting wrong assertions and added few more test cases.

Harbormaster completed remote builds in B107213: Diff 349225.Jun 2 2021, 4:40 AM

manas marked 10 inline comments as done.Jun 2 2021, 9:23 PM

I also would like to see tests where the ranges are not going all the way to either INT_MIN or INT_MAX (if we talk about int), but overflow still might happen, and cases where overflow might happen, but we still can identify the overflowing results precisely (e.g. the result is `[INT_MIN, INT_MIN + 10] and [INT_MAX - 5, INT_MAX])

clang/test/Analysis/constant-folding.c
309	Good!
312	Great, it's good to check negative numbers. I have one note here. The fact that we don't have a testing framework and use this approach of inspecting the actual analysis has an interesting implication. if (a == 10) { // here we split path into 2 paths: one where a == 10, and one where a != 10; // one goes inside of if, one doesn't . . . } if (a >= 5) { // here we split into 3 paths: a == 10, a < 5, and a in [5, 9] and [11, INT_MAX] (aka a >= 5 and a != 10). // 1 path was infeasible: a == 10 and a < 5 // Two of these paths go inside of the if, one doesn't . . . clang_analyzer_eval(a == 10); // it will produce two results: TRUE and FALSE } clang_analyzer_eval(a == 10); // it will produce three results: TRUE, FALSE, and FALSE We don't want to test how path splitting works in these particular tests (they are about solver and constant folding after all), that's why we try to have only one path going through each `clang_analyzer_eval(...)` In this example, you still have residual paths where `c != INT_MIN`, `c == INT_MIN and d != INT_MIN`, and `c == INT_MIN and d == INT_MIN`.
334–335	How can these two statements be TRUE at the same time?

I also would like to see tests where the ranges are not going all the way to either INT_MIN or INT_MAX (if we talk about int), but overflow still might happen, and cases where overflow might happen, but we still can identify the overflowing results precisely (e.g. the result is [INT_MIN, INT_MIN + 10] and [INT_MAX - 5, INT_MAX])

If I understood correctly, does a case like: c, d in [INT_MAX/2 - 10, INT_MAX/2 + 10] works? It will produce an overflowing range of [INT_MIN, INT_MIN + 18] U [INT_MAX - 21, INT_MAX]. I will add that to the test-set, if that is so.

For now, I have added tests for residual paths and negations of particular values, and edited buggy tests.

clang/test/Analysis/constant-folding.c
312	I should add tests for these paths as well so that these can be checked. For further cases, I will enforce single path evaluation in test cases (which will make it easier to handle here).
334–335	Right, my bad. They both should be `UNKNOWN`. As, `c` and `d` are signed 32-bit positive integers, hence their respective values will be in `[1, INT_MAX]`. When `c == d == 1` then `c + d == 2`, and when `c == d == INT_MAX` then `c + d == -2` (overflow). Only possible values which `c + d` cannot attain are `{-1, 0, 1}`. As a simple proof: Dividing this range into R1 = [1, INT_MAX/2] and R2 = [(INT_MAX/2) + 1, INT_MAX] Now, `c + d` will have 4 combination of ranges to be solved: `R1 + R1` : this will never overflow as the maximum value it can attain will be `INT_MAX - 1` (when `c == d == INT_MAX/2`) while in `R1 + R2`, `R2 + R1`, and `R2 + R2` expressions will overflow, leading value of `c + d` from `INT_MIN` to `-2`, except for case `INT_MAX/2 + (INT_MAX/2 + 1)`, where `c + d == INT_MAX`. Hence, only possible values which can never be attained by `c + d` will be `{-1, 0, 1}`. So, the range for our purposes will be `[INT_MIN, -2] U [2, INT_MAX]`. I think I should write tests for `c + d != {-1, 0, 1}` which will make more sense here.

Added tests for residual paths and negation of certain values, and fixed expected warnings for UNKNOWN cases.

Harbormaster completed remote builds in B107593: Diff 349749.Jun 3 2021, 8:08 PM

vsavchenko added inline comments.Jun 4 2021, 1:18 AM

clang/test/Analysis/constant-folding.c
312	In these tests we should really avoid having multiple feasible paths going through one `clang_analyzer_eval`. We can add `// expected-warning` to it multiple times, but it's harder to understand as a reader of these tests because you need to split all the paths in your mind just like the analyzer does. It can be useful when testing other features of the analyzer, but here it's not essential. And I don't think that you fully understood me. When multiple paths go through `clang_analyzer_eval`, each of them produces a separate "warning". That's why if you want to account for residual paths, you shouldn't add more `if` statements, but add something like: clang_analyzer_eval(a == 10); // expected-warning{{TRUE}} // expected-warning@-1{{TRUE}} // expected-warning@-2{{FALSE}} So, please, instead of writing tests like this, rewrite your test, so that we don't have residual paths.
334–335	I see, that's a good test!

In D103440#2797991, @manas wrote:

I also would like to see tests where the ranges are not going all the way to either INT_MIN or INT_MAX (if we talk about int), but overflow still might happen, and cases where overflow might happen, but we still can identify the overflowing results precisely (e.g. the result is [INT_MIN, INT_MIN + 10] and [INT_MAX - 5, INT_MAX])

If I understood correctly, does a case like: c, d in [INT_MAX/2 - 10, INT_MAX/2 + 10] works? It will produce an overflowing range of [INT_MIN, INT_MIN + 18] U [INT_MAX - 21, INT_MAX]. I will add that to the test-set, if that is so.

Yes, exactly!

manas added inline comments.Jun 4 2021, 2:30 AM

clang/test/Analysis/constant-folding.c
312	Understood. I am changing those for single path evaluation accordingly.

Fix for cases involving residual paths and add case for overflowing range near extremum of a type

Harbormaster completed remote builds in B107627: Diff 349796.Jun 4 2021, 3:02 AM

I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

For example:

(declare-const a ( _ BitVec 32 ))
(declare-const b ( _ BitVec 32 ))

; Assume 0 <= a, b, <= 5
(assert (bvsge a #x00000000))
(assert (bvsge b #x00000000))
(assert (bvsle a #x00000005))
(assert (bvsle b #x00000005))

; Check if 0 <= a + b  <= 10
(assert (not (and (bvsle (bvadd a b) #x0000000A) (bvsge (bvadd a b) #x00000000))))

(check-sat)

Of course, the example above is obvious, but with overflows and other corner cases it is great to have a tool doublecheck our assumptions, and share how to reproduce the results.

In D103440#2799629, @xazax.hun wrote:

I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

We are looking forward to design a unit-test framework for the solver which can compact the test cases and make them much more manageable (unlike constant-folding.c). Perhaps, we can incorporate the Z3 proves in that framework, corresponding to test cases.

In D103440#2799629, @xazax.hun wrote:
I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

For example:
(declare-const a ( _ BitVec 32 ))
(declare-const b ( _ BitVec 32 ))

; Assume 0 <= a, b, <= 5
(assert (bvsge a #x00000000))
(assert (bvsge b #x00000000))
(assert (bvsle a #x00000005))
(assert (bvsle b #x00000005))

; Check if 0 <= a + b  <= 10
(assert (not (and (bvsle (bvadd a b) #x0000000A) (bvsge (bvadd a b) #x00000000))))

(check-sat)
Of course, the example above is obvious, but with overflows and other corner cases it is great to have a tool doublecheck our assumptions, and share how to reproduce the results.

I usually add things like this not about the tests, but about the implementation. I think it provides a bit more generality. But usually I do it as a comment to the patchset, maybe it is also good to put it in the code as well.

In D103440#2800122, @manas wrote:

In D103440#2799629, @xazax.hun wrote:

I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

We are looking forward to design a unit-test framework for the solver which can compact the test cases and make them much more manageable (unlike constant-folding.c). Perhaps, we can incorporate the Z3 proves in that framework, corresponding to test cases.

Hmm, so you mean we can check if the analyzer was compiled with Z3 and if so, verify the same things by it?

In D103440#2800710, @vsavchenko wrote:

In D103440#2800122, @manas wrote:

In D103440#2799629, @xazax.hun wrote:

I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

We are looking forward to design a unit-test framework for the solver which can compact the test cases and make them much more manageable (unlike constant-folding.c). Perhaps, we can incorporate the Z3 proves in that framework, corresponding to test cases.

Hmm, so you mean we can check if the analyzer was compiled with Z3 and if so, verify the same things by it?

Yeah in some sense. But I think that having proof for every test case may become redundant for certain cases.

For e.g., consider two test cases for addition and subtraction operator:

c == [0, 10] and d == [-10, 0] will result in (c + d) == [-10, 10]
c == d == [0, 10] will result in (c - d) == [-10, 10]

But the first test case can be modeled as c - (- d)equivalently c - D, that is,

usage of subtraction binary operator : (c - D), and
symmetrical inversion of range around origin (- d) for symbol d. This will shift the range from [-10, 0] to [0, 10].

Considering having proof for every test case will make the proof for test-case 1 kind of redundant in comparison to test-case 2.

So, I think we should go with @vsavchenko 's method of adding Z3 proof with the implementation (in code), instead of test cases themselves.

manas updated this revision to Diff 352283.Jun 15 2021, 4:05 PM

manas edited the summary of this revision. (Show Details)

Add logic for computing rangeset for an expression containing BO_Add.

Herald added a subscriber: martong. · View Herald TranscriptJun 15 2021, 4:05 PM

manas added inline comments.Jun 15 2021, 4:09 PM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1413–1414	I changed the logic from using getCrucialPoints (which I mentioned in the e-mail thread) to simple checks to determine overflows using signedness. But the failed assertions again popped up. And I was unable to pin-point the issue here. Can someone help me? Although, I am thinking of revising the above logic by using bitwise methods to detect overflows.

Harbormaster completed remote builds in B109405: Diff 352283.Jun 15 2021, 8:08 PM

vsavchenko added inline comments.Jun 16 2021, 1:29 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1409	This clause is exactly the same as the previous one, it is a mistake. And I think we should have a test that could've shown that. Also, since you are checking for overflows for both the beginning and the end, we should have tests where both overflow.
1413–1414	This is actually one place that I won't expect an assertion failure. Can we get a bit more detail on it? Is it again `From > To` (which will indeed be weird) or something different? NOTE: Asking for help (either here or on StackOverflow) is a lot like filing a bug report, try to put as much information as possible, try to structure this information so it is easy to follow. It's also good to tell people what you tried, instead of saying that you tried something and it didn't work.
1421–1427	I thought we talked quite a lot that there is nothing bad with overflows and here we have that if ANY overflow happened, we bail out and don't give any result.

vsavchenko added inline comments.Jun 16 2021, 2:36 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1413–1414	OK, I downloaded your patch and ran the debugger. It complains about different bit-width for ranges that the analyzer tries to intersect. What I checked next: what are those ranges, so I took `begin()` from one of them and checked what are `From` and `To` there. Here is one of them: (const llvm::APSInt) $8 = { llvm::APInt = { U = { VAL = 22 pVal = 0x0000000000000016 } BitWidth = 4022311424 } IsUnsigned = true } Woah, this `BitWidth` seems ridiculous! What does it tell us? It definitely didn't get there as part of any reasonable logic, right? So, what can it be instead? Only one answer here - garbage. We have some sort of memory issue with integers that we use as part of our ranges! So, let's see what type of information `Range` class actually stores: class Range { public: Range(const llvm::APSInt &From, const llvm::APSInt &To) : Impl(&From, &To) { assert(From <= To); } ... private: std::pair<const llvm::APSInt , const llvm::APSInt > Impl; }; What we have here is a pair of pointers, and you have: llvm::APSInt Min = LHS.From() + RHS.From(); llvm::APSInt Max = LHS.To() + RHS.To(); llvm::APSInt Tmin = ValueFactory.getMinValue(ResultType); llvm::APSInt Tmax = ValueFactory.getMaxValue(ResultType); These are ALL stack variables. So, pointer to those point to adequate data ONLY while we are inside of our functions. When we call another function, these pointers point into some data from some other function in the call stack, it doesn't point to anything even remotely resembling `llvm::APSInt` and we get `BitWidth = 4022311424`. OK, we figured out the problem, what about a solution? If you look at other implementations, you can notice a couple of interesting things. When you do: llvm::APSInt Tmin = ValueFactory.getMinValue(ResultType); other code does something like: const llvm::APSInt &Tmin = ValueFactory.getMinValue(ResultType); Or when you do: Range Second = Range(Min, Tmax); The sibling code does: return {RangeFactory, ValueFactory.getValue(Min), ValueFactory.getValue(Max)}; It looks like `ValueFactory` is the key here! It actually manages the lifetime of those integers for you and, whenever you ask it about values, it gives you `llvm::APSInt &` (note the reference part) that will have the correct lifetime. I hope this gives you some insight into how you can debug things like this on your own and how you can reason about what you see. Another piece of advice is to look around, other `VisitBinaryOperator` methods have all the information you actually needed. If you don't understand why we need `ValueFactory`, - experiment, ask us! It's bad to just ignore it.
1417–1418	Why not just `return RangeFactory.add(ResultRangeSet, Second)`? NOTE: variables with integers in their names is a big no-no.

manas added inline comments.Jun 16 2021, 4:31 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1409	Understood! I will add tests to check each OR part of these conditionals in these cases.
1413–1414	Got it! I am working on fixing this.
1417–1418	Right! I assumed I have to refactor this part so I used this temporarily.
1421–1427	Understood! Should I replace it with code returning EmptySet()?

vsavchenko added inline comments.Jun 16 2021, 5:39 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1421–1427	Why `EmptySet()`? `EmptySet()` means that this can never happen, this path is infeasible. Is that the case? Let's say we have: `[INT_MAX - 20, INT_MAX - 10] + [30, 40]` what happens in this case?

manas added inline comments.Jun 16 2021, 6:16 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1421–1427	Right! That will not be the case. In this particular case, the range will be `[INT_MIN + 9, INT_MIN + 29]` which is far smaller than `[Tmin, Tmax]`. Also, I think I misunderstood the part of bailing out and not giving any result as returning empty.

vsavchenko added inline comments.Jun 16 2021, 6:20 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1421–1427	Gotcha! Because of cases like this, you need to re-think that part about `Min > Max` and maybe count the number of overflows on each side?

manas added inline comments.Jun 16 2021, 6:36 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1421–1427	That's right. If it is able to deduce how many times overflows are occurring then it can reason about whether it will be MaxRangeSet or a subset of it. Fixing it!

Reason about cases where Min > Max

The diff fixes all invalid assertion issues and also reasons about the cases where Min > Max.

One thing which is stuck for me is the case where Min <= Max but it overflows. I could reason about that in this way:

If one of Min/Max overflows while the other doesn't then the resulting range should be [Tmin, Tmax] as Min <= Max even after overflowing.
If both of them overflows, then: a. If both overflows on the same side, that is, both wrapped around Tmax, or both wrapped around Tmin, then the range should be [Min, Max] only. b. But if both overflowed in different sides, supposedly Min overflowed on left and Max on right, or vice versa, then the range will be [Tmin, Tmax].

Will it work?

clang/test/Analysis/constant-folding.c
310–311	When I pull these cases out to a separate function (say testAdditionRules2) in constant-folding.c then algorithm is able to reason about the range correctly, but inside testAdditionRules, it is unable to figure stuff out. Does it have something to do with constraints being propagated which we discussed below? @vsavchenko wrote: I have one note here. The fact that we don't have a testing framework and use this approach of inspecting the actual analysis has an interesting implication. if (a == 10) { // here we split path into 2 paths: one where a == 10, and one where a != 10; // one goes inside of if, one doesn't . . . } if (a >= 5) { // here we split into 3 paths: a == 10, a < 5, and a in [5, 9] and [11, INT_MAX] (aka a >= 5 and a != 10). // 1 path was infeasible: a == 10 and a < 5 // Two of these paths go inside of the if, one doesn't . . . clang_analyzer_eval(a == 10); // it will produce two results: TRUE and FALSE } clang_analyzer_eval(a == 10); // it will produce three results: TRUE, FALSE, and FALSE We don't want to test how path splitting works in these particular tests (they are about solver and constant folding after all), that's why we try to have only one path going through each `clang_analyzer_eval(...)`

Harbormaster completed remote builds in B110049: Diff 353168.Jun 19 2021, 5:14 PM

manas marked 11 inline comments as done.Jun 20 2021, 10:08 PM

vsavchenko added inline comments.Jun 21 2021, 1:22 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1397	I commented on this part previously, you shouldn't get fixated on `Min > Max` because `Max >= Min` doesn't imply that overflow didn't happen (less important) and that the range `[Min, Max]` is correct (more important). Recall one of the examples that I had in that email. There is an interesting pattern about results that we can use: no overflows happened -> `[Min, Max]` (should be always true) 1 overflow happened -> we need to invert the range, but there are two different cases: `Min > Max`, perfect, that's what we expected -> `[Tmin, Max] ∪ [Min, Tmax]` `Max >= Min`, we overflowed and wrapped the whole range -> `[Tmin, Tmax]` 2 overflows happened on one side -> `[Min, Max]` 2 overflows happened on both sides -> `[Tmin, Tmax]` You need to think of this problem in terms of what really happens and not in terms of combating with assertions.
1413	What if `LHS.From()` is `Tmin` for signed `T`? What if `T` is unsigned? Does `Tmin - LHS.From()` (aka `0 - LHS.From()`) make sense?
1450–1453	Speaking of unsigned `T`, does it work for unsigned overflows? Do we have tests for that?
clang/test/Analysis/constant-folding.c
310–311	It might be that or it might be something different. Just by looking at this example, the previous `if` statement shouldn't add more paths that go inside of this `if`, so it shouldn't be the root cause. Whenever you encounter problems and you want to tell other people, please, provide more detail. Did you notice it when running the test case? What was the output? What did you try? How did you extract it into a separate function?

manas added inline comments.Jun 21 2021, 1:57 AM

clang/test/Analysis/constant-folding.c

310–311

I put a new test function in constant-folding.c as:

void testAdditionRulesNew(int c, int d) {
  if (c < 0 && c != INT_MIN && d < 0) {
    clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
  }
}

I tested this specific function as:

./build/bin/clang -cc1 -analyze -analyzer-checker=core,debug.ExprInspection -analyze-function=testAdditionRulesNew constant-folding.c

And I got the following output:

../clang/test/Analysis/constant-folding.c:338:5: warning: FALSE [debug.ExprInspection]
  clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}

which is correct.

But when I ran the same test inside testAdditionRules, using:

./build/bin/clang -cc1 -analyze -analyzer-checker=core,debug.ExprInspection -analyze-function=testAdditionRules constant-folding.c

then I got:

../clang/test/Analysis/constant-folding.c:281:5: warning: FALSE [debug.ExprInspection]
  clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}                                                                                            
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                          
../clang/test/Analysis/constant-folding.c:281:5: warning: TRUE [debug.ExprInspection]
  clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}                                                                                            
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here, c = [INT_MIN + 1, -1] and d = [INT_MIN, 0], so c + d = [INT_MIN, -2] U [1, INT_MAX]. So c + d == 0 should be false. But in latter case, it is reasoning c + d == 0 to be UNKNOWN.

Also, the error arises in c + d == 0 case only and not in c + d == -1 case. I mistakenly highlighted that case while commenting.

vsavchenko added inline comments.Jun 21 2021, 4:24 AM

clang/test/Analysis/constant-folding.c
310–311	Hmm, I don't know what can be the reason. There are three reasonable approaches: Straightforward: use debugger. Find a place when we ask the solver about this assumption and dig in. Print-based: use `clang_analyzer_printState` inside of this if. Among other things it will print you all constraints that we know about, you can check if they match your expectations. Also, if it is indeed because of residual paths, you will get another print from residual path. Reduction: add the whole `testAdditionRules` into `testAdditionRulesNew` and start removing parts of it. At some point problem should disappear (based on your previous observations). That "minimal" reproducer can give you insight into what's going on. These are not mutually exclusive, so you can try them in combination.

manas added inline comments.Jun 21 2021, 6:41 PM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1413	Even if `LHS.From()` is `Tmin` for signed `T` the reasoning would be correct. As, CrucialPoint will be `Zero` and hence RHS will be divided into two parts: BeforeZero (`[RHS.From, Zero]`) and AfterAndIncludingZero (`[Zero, RHS.To]`). And if the BeforeZero subrange (which will be responsible for overflowing, contains `Tmin + Tmax` then we can ensure that the final range will be `[Tmin, Tmax]`. And if it does not, (and assuming there is no overflow on the right side), then `1432:1434` will reason the range to be `[Tmin, Max] U [Min, Tmax]`. For unsigned types, actually we are asking the wrong question because an addition of unsigned integers can never overflow from the left side. So in that case, the code deducing the overflows from left-side will not be applicable. And therefore, `0 - LHS.From()` will never give us anything. I was thinking to checked T before applying these rules.
1450–1453	This was not going to make into the patch as it does not solve the case mentioned in FIXME(:1437). An idea which I had in the previous post was this: If one of Min/Max overflows while the other doesn't then the resulting range should be [Tmin, Tmax] as Min <= Max even after overflowing. If both of them overflows, then: a. If both overflows on the same side, that is, both wrapped around Tmax, or both wrapped around Tmin, then the range should be [Min, Max] only. b. But if both overflowed in different sides, supposedly Min overflowed on left and Max on right, or vice versa, then the range will be [Tmin, Tmax].

manas added inline comments.Jun 21 2021, 6:51 PM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1397	I think I agree with you. This pattern (and the code following this pattern) makes much more sense, and easier to follow. I think I will refactor the patch accordingly.

manas added inline comments.Jun 21 2021, 9:17 PM

clang/test/Analysis/constant-folding.c

310–311

I tried print the ProgramState as in both the cases like:

if (c < 0 && c != INT_MIN && d < 0) {
  clang_analyzer_printState();
  clang_analyzer_eval(...);
  ...
}

During testAdditionRulesNew, when I added clang_analyzer_printState just before doing any clang_analyzer_eval calls, I got the following constraints (these are as expected):

"constraints": [                                                                                                                                             
  { "symbol": "reg_$0<int c>", "range": "{ [-2147483647, -1] }" },                                                                                           
  { "symbol": "reg_$1<int d>", "range": "{ [-2147483648, -1] }" }                                                                                            
],

In testAdditionRules, adding printState before any evaluation led to:

"constraints": [                                                                                                                                             
  { "symbol": "reg_$2<int c>", "range": "{ [-2147483647, -1] }" },                                                                                           
  { "symbol": "reg_$3<int d>", "range": "{ [-2147483648, -1] }" },                                                                                           
  { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [0, 2147483647] }" }                                                                          
],

"constraints": [
  { "symbol": "reg_$2<int c>", "range": "{ [-2147483647, -1] }" },
  { "symbol": "reg_$3<int d>", "range": "{ [-2147483648, -1] }" },
  { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [-2147483648, -2] }" }
],

A point to note is that this ProgramState is before any clang_analyzer_eval calls have been made. So, the third constraint, in both sets of constraints (for c + d) should not exist there, that is,

a. { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [0, 2147483647] }" }
b. { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [-2147483648, -2] }" }

should not be in the constraints set.

And as obvious, the UNKNOWN reasoning for c + d == 0 in testAdditionRules is arising from constraint (a) above, which is a wrong reasoning (it ideally should be [1, INT_MAX] and not [0, INT_MAX]).

manas added inline comments.Jun 21 2021, 11:17 PM

clang/test/Analysis/constant-folding.c

310–311

@vsavchenko wrote:

Reduction: add the whole testAdditionRules into testAdditionRulesNew and start removing parts of it. At some point problem should disappear (based on your previous observations). That "minimal" reproducer can give you insight into what's going on.

Trying this, I was able to deduce that the following snippet is causing UNKNOWN triggers:

void testAdditionRulesNew(unsigned int a, unsigned int b, int c, int d) {
  if (c < 0 && d < 0) {
    clang_analyzer_printState();
    clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}
    clang_analyzer_printState();
  }

  if (c < 0 && c != INT_MIN && d < 0) {
    clang_analyzer_printState();
    clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
    clang_analyzer_printState();
  }
}

The output warnings are as follows:

<snipped-ProgramStates>
../clang/test/Analysis/constant-folding.c:341:5: warning: FALSE [debug.ExprInspection]
    clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../clang/test/Analysis/constant-folding.c:341:5: warning: TRUE [debug.ExprInspection]
    clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../clang/test/Analysis/constant-folding.c:347:5: warning: FALSE [debug.ExprInspection]
    clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../clang/test/Analysis/constant-folding.c:347:5: warning: TRUE [debug.ExprInspection]
    clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 warnings generated.

The second evaluation is leading to wrong reasoning. I think the constraints are being propagated somehow from first conditional (c < 0 && d < 0) to the second conditional (c < 0 && c != INT_MIN && d < 0).

I have added a pastebin using clang_analyzer_printState here: https://pastebin.com/0E8hmTWh

vsavchenko added inline comments.Jun 22 2021, 2:38 AM

clang/test/Analysis/constant-folding.c
310–311	Ooof, OK. I know what's going on here. `clang_analyzer_eval` actually assumes its condition. So it works as expected when the result is `TRUE` or `FALSE`, but actually splits the path when `UNKNOWN`. This being said, the first conditional doesn't affect you (it is weaker than the second condition), `clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}` does. After that point, you have two states: one where `c + d >= 0` (you called it a.) and the other with `c + d < 0` (b.). Another important point here is located in `SymbolicRangeInferrer::infer(SymbolRef Sym)`. You can see that there are three returns in that function. Your functionality lives in the very last one, namely in `Visit(Sym)`. What it means is that we ALWAYS give preference to existing constraints. And if we don't have any, we start traversing symbolic expressions to get something from that level. This logic is flawed because we can be in a situation like you encountered now. Where we believe that `c + d >= 0`, but if we look at `c` and `d`, we can see that this is actually impossible. It can be solved by always doing `Visit(Sym)` and intersecting results from ALL available sources. It has a simple implementation, but requires additional performance testing. Long story short, that's not your problem. Try to avoid it/rewrite the test.

Added updated logic for reasoning using number of overflows. Also, changed a couple of tests which were leading to unwanted constriants being propagated further.

manas marked 7 inline comments as done.Jun 22 2021, 4:59 PM

Regarding the tweakings in constant-folding.c, I have refrained from using cases which were resulting in UNKNOWN assertions as they were the primary reason for constraints being propagated.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1073–1074	We should have these specific functions for other BO as well. Because they will lead us to reason about when `Operand1 binop Operand2` can overflow or not. I was thinking in the direction of having a simpler class which works for this.

manas marked an inline comment as not done.Jun 22 2021, 6:29 PM

Harbormaster completed remote builds in B110535: Diff 353825.Jun 22 2021, 7:29 PM

Hey, it looks like we are finally converging on this one! Great job!

NOTE: I don't know if you noticed, but I want to point out that there are three failing tests:

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1073–1074	I searched through a codebase a bit and here are the couple of functions (from `APInt.h`) that can come in handy: // Operations that return overflow indicators. APInt sadd_ov(const APInt &RHS, bool &Overflow) const; APInt uadd_ov(const APInt &RHS, bool &Overflow) const; APInt ssub_ov(const APInt &RHS, bool &Overflow) const; APInt usub_ov(const APInt &RHS, bool &Overflow) const; APInt sdiv_ov(const APInt &RHS, bool &Overflow) const; APInt smul_ov(const APInt &RHS, bool &Overflow) const; APInt umul_ov(const APInt &RHS, bool &Overflow) const; APInt sshl_ov(const APInt &Amt, bool &Overflow) const; APInt ushl_ov(const APInt &Amt, bool &Overflow) const; `APSInt` is derived from `APInt`, so we can totally use these.
clang/test/Analysis/constant-folding.c
299	They are already unsigned, we don't need `a >= 0` and `b >= 0`
360	I don't see the cases where we overflow on both ends and the case where we overflow on one end, but `Min > Max`.

I am working on debugging those 3 cases.

Thanks @vsavchenko and everyone for helping! :)

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1073–1074	That's great! I will incorporate those in the code instead of custom functions.
clang/test/Analysis/constant-folding.c
299	Right! I will remove those constraints.
360	My bad. I thought I added both overflows one. I will add them now.

Fix issues involving cases for unsigned type and add tests

manas added inline comments.Jun 23 2021, 10:34 PM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402	Using `uadd_ov` (and `sadd_ov`), we can get the added value as well as whether overflow occurred or not. A point is that these functions return `APInt` instead of `APSInt`. But when I tried just using: Min = LHS.From().uadd_ov(RHS.From(), HasMinOverflowed); Max = LHS.To().uadd_ov(RHS.From(), HasMaxOverflowed); instead of Min = LHS.From() + RHS.From(); Max = LHS.To() + RHS.To(); just for the added value, then the following tests failed (these tests and all other tests pass when I use the latter method to get Min/Max): Clang :: Analysis/PR3991.m Clang :: Analysis/global-region-invalidation.c Clang :: Analysis/malloc-overflow2.c Clang :: Analysis/out-of-bounds-new.cpp Clang :: Analysis/taint-generic.c I am working on fixing this part.

manas marked 3 inline comments as done.Jun 23 2021, 11:03 PM

Harbormaster completed remote builds in B110762: Diff 354154.Jun 24 2021, 12:57 AM

Rebase

vsavchenko added inline comments.Jun 24 2021, 1:49 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402	You can easily construct `APSInt` from `APInt` using `APSInt ::APSInt(APInt I, bool isUnsigned)` constructor.

manas marked an inline comment as not done.Jun 24 2021, 1:52 AM

manas added inline comments.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402	Okay. I will try with using `uadd_ov` only then. And check whether those tests pass or not.

vsavchenko added inline comments.Jun 24 2021, 2:13 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402	Hmm, why only `uadd_ov`? What about those tests? How do they fail? Try to look at the reasons and not brute-force by trying different solutions blindly. Those tests are your friends, it's much better to get failures right now then getting them later when you land the patch.

Harbormaster completed remote builds in B110771: Diff 354167.Jun 24 2021, 2:16 AM

manas added inline comments.Jun 24 2021, 2:26 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402	I meant related functions as well. Most of them were failing due to `Error evaluating statements`. Although, I have rebased my tree so I will re-check again.

vsavchenko added inline comments.Jun 24 2021, 2:44 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402	`Error evaluating statements` just tells you when we crashed (for the context and fast insight). Below this message is usually the real reason (and the last time I checked, it was assertion `From > To` failure).

manas added inline comments.Jun 24 2021, 3:09 AM

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402	My bad. I took a look and found, it was signed mismatch!

Fix issues involving usage of uadd_ov and family of functions

manas marked 3 inline comments as done.Jun 24 2021, 7:55 PM

Harbormaster completed remote builds in B110943: Diff 354424.Jun 24 2021, 8:49 PM

vsavchenko added inline comments.Jun 24 2021, 11:48 PM

clang/test/Analysis/constant-folding.c
294–296	We need a test with ranges and unsigned overflow
360	This comment is marked as "Done", but I still don't these cases.

Add tests for overflows on both ends

manas added inline comments.Jun 24 2021, 11:59 PM

clang/test/Analysis/constant-folding.c
360	Amended. From line 260 and later.

Harbormaster completed remote builds in B110955: Diff 354441.Jun 25 2021, 12:39 AM

Reformat comments

Harbormaster completed remote builds in B111948: Diff 355825.Jul 1 2021, 4:35 AM

Replace literal-value 0 with APSInt Zero

Rebase

Harbormaster completed remote builds in B112252: Diff 356256.Jul 2 2021, 1:42 PM

Here is the proof of correctness of the algorithm using Z3: https://gist.github.com/weirdsmiley/ad6a9dbf3370e96d29f9e90068931d25

In D103440#2891915, @manas wrote:

Here is the proof of correctness of the algorithm using Z3: https://gist.github.com/weirdsmiley/ad6a9dbf3370e96d29f9e90068931d25

There is a couple of fundamental problems there that you need to resolve. I left my comments there.

clang/test/Analysis/constant-folding.c
290	This looks like an overflow on one end (we couldn't have them on both ends with addition and unsigned integers) and wrapping around surpassing the beginning of the range.

Fix test comments

vsavchenko added inline comments.Aug 3 2021, 7:40 AM

clang/utils/analyzer/Dockerfile
15–22 ↗	(On Diff #363734)	Changes to `clang/utils/analyzer/*` don't belong to this patch

manas added inline comments.Aug 3 2021, 7:42 AM

clang/utils/analyzer/Dockerfile
15–22 ↗	(On Diff #363734)	I messed up while `arc updating`. :(

Fix unrelated commits mess up!

I have updated the proof for Add: https://gist.github.com/weirdsmiley/ad6a9dbf3370e96d29f9e90068931d25

Harbormaster completed remote builds in B117641: Diff 363742.Aug 3 2021, 8:35 AM

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Core/

RangeConstraintManager.cpp

61 lines

test/

Analysis/

constant-folding.c

102 lines

Diff 363742

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

Show All 11 Lines
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/Basic/JsonSupport.h"		#include "clang/Basic/JsonSupport.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/APSIntType.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/APSIntType.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/SValVisitor.h"
		#include "llvm/ADT/APSInt.h"
#include "llvm/ADT/FoldingSet.h"		#include "llvm/ADT/FoldingSet.h"
#include "llvm/ADT/ImmutableSet.h"		#include "llvm/ADT/ImmutableSet.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code -#include "llvm/ADT/StringExtras.h" Lint: Pre-merge checks: clang-format: please reformat the code ``` -#include "llvm/ADT/StringExtras.h" ```
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
		Lint: Pre-merge checks Inline Actions clang-format: please reformat the code +#include "llvm/ADT/StringExtras.h" Lint: Pre-merge checks: clang-format: please reformat the code ``` +#include "llvm/ADT/StringExtras.h" ```
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"
#include <algorithm>		#include <algorithm>
#include <iterator>		#include <iterator>

using namespace clang;		using namespace clang;
using namespace ento;		using namespace ento;

▲ Show 20 Lines • Show All 923 Lines • ▼ Show 20 Lines	RangeSet VisitBinaryOperator(RangeSet LHS, BinaryOperator::Opcode Op,
RangeSet RHS, QualType T) {		RangeSet RHS, QualType T) {
switch (Op) {		switch (Op) {
case BO_Or:		case BO_Or:
return VisitBinaryOperator<BO_Or>(LHS, RHS, T);		return VisitBinaryOperator<BO_Or>(LHS, RHS, T);
case BO_And:		case BO_And:
return VisitBinaryOperator<BO_And>(LHS, RHS, T);		return VisitBinaryOperator<BO_And>(LHS, RHS, T);
case BO_Rem:		case BO_Rem:
return VisitBinaryOperator<BO_Rem>(LHS, RHS, T);		return VisitBinaryOperator<BO_Rem>(LHS, RHS, T);
		case BO_Add:
		return VisitBinaryOperator<BO_Add>(LHS, RHS, T);
default:		default:
return infer(T);		return infer(T);
}		}
}		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Ranges and operators		// Ranges and operators
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	Range getSymmetricalRange(Range Origin, QualType T) {

// Intersection is guaranteed to be non-empty.		// Intersection is guaranteed to be non-empty.
return {ValueFactory.getValue(-AbsMax), ValueFactory.getValue(AbsMax)};		return {ValueFactory.getValue(-AbsMax), ValueFactory.getValue(AbsMax)};
}		}

/// Return a range set subtracting zero from \p Domain.		/// Return a range set subtracting zero from \p Domain.
RangeSet assumeNonZero(RangeSet Domain, QualType T) {		RangeSet assumeNonZero(RangeSet Domain, QualType T) {
APSIntType IntType = ValueFactory.getAPSIntType(T);		APSIntType IntType = ValueFactory.getAPSIntType(T);
return RangeFactory.deletePoint(Domain, IntType.getZeroValue());		return RangeFactory.deletePoint(Domain, IntType.getZeroValue());
}		}
		manasAuthorUnsubmitted Done Reply Inline Actions We should have these specific functions for other BO as well. Because they will lead us to reason about when `Operand1 binop Operand2` can overflow or not. I was thinking in the direction of having a simpler class which works for this. manas: We should have these specific functions for other BO as well. Because they will lead us to…
		vsavchenkoUnsubmitted Done Reply Inline Actions I searched through a codebase a bit and here are the couple of functions (from `APInt.h`) that can come in handy: // Operations that return overflow indicators. APInt sadd_ov(const APInt &RHS, bool &Overflow) const; APInt uadd_ov(const APInt &RHS, bool &Overflow) const; APInt ssub_ov(const APInt &RHS, bool &Overflow) const; APInt usub_ov(const APInt &RHS, bool &Overflow) const; APInt sdiv_ov(const APInt &RHS, bool &Overflow) const; APInt smul_ov(const APInt &RHS, bool &Overflow) const; APInt umul_ov(const APInt &RHS, bool &Overflow) const; APInt sshl_ov(const APInt &Amt, bool &Overflow) const; APInt ushl_ov(const APInt &Amt, bool &Overflow) const; `APSInt` is derived from `APInt`, so we can totally use these. vsavchenko: I searched through a codebase a bit and here are the couple of functions (from `APInt.h`) that…
		manasAuthorUnsubmitted Done Reply Inline Actions That's great! I will incorporate those in the code instead of custom functions. manas: That's great! I will incorporate those in the code instead of custom functions.

// FIXME: Once SValBuilder supports unary minus, we should use SValBuilder to		// FIXME: Once SValBuilder supports unary minus, we should use SValBuilder to
// obtain the negated symbolic expression instead of constructing the		// obtain the negated symbolic expression instead of constructing the
// symbol manually. This will allow us to support finding ranges of not		// symbol manually. This will allow us to support finding ranges of not
// only negated SymSymExpr-type expressions, but also of other, simpler		// only negated SymSymExpr-type expressions, but also of other, simpler
// expressions which we currently do not know how to negate.		// expressions which we currently do not know how to negate.
Optional<RangeSet> getRangeForNegatedSub(SymbolRef Sym) {		Optional<RangeSet> getRangeForNegatedSub(SymbolRef Sym) {
if (const SymSymExpr *SSE = dyn_cast<SymSymExpr>(Sym)) {		if (const SymSymExpr *SSE = dyn_cast<SymSymExpr>(Sym)) {
▲ Show 20 Lines • Show All 296 Lines • ▼ Show 20 Lines	if (IsLHSPositiveOrZero && IsRHSPositiveOrZero) {
Min = LHS.To() < RHS.From() ? LHS.From() : Zero;		Min = LHS.To() < RHS.From() ? LHS.From() : Zero;
}		}

// Nevertheless, the symmetrical range for RHS is a conservative estimate		// Nevertheless, the symmetrical range for RHS is a conservative estimate
// for any sign of either LHS, or RHS.		// for any sign of either LHS, or RHS.
return {RangeFactory, ValueFactory.getValue(Min), ValueFactory.getValue(Max)};		return {RangeFactory, ValueFactory.getValue(Min), ValueFactory.getValue(Max)};
}		}

		template <>
		RangeSet SymbolicRangeInferrer::VisitBinaryOperator<BO_Add>(Range LHS,
		Range RHS,
		QualType T) {
		APSIntType ResultType = ValueFactory.getAPSIntType(T);
		llvm::APSInt Zero = ResultType.getZeroValue();
		const llvm::APSInt &Tmin = ValueFactory.getMinValue(ResultType);
		const llvm::APSInt &Tmax = ValueFactory.getMaxValue(ResultType);

		llvm::APSInt Min, Max;
		bool HasMinOverflowed, HasMaxOverflowed;
		vsavchenkoUnsubmitted Done Reply Inline Actions I commented on this part previously, you shouldn't get fixated on `Min > Max` because `Max >= Min` doesn't imply that overflow didn't happen (less important) and that the range `[Min, Max]` is correct (more important). Recall one of the examples that I had in that email. There is an interesting pattern about results that we can use: no overflows happened -> `[Min, Max]` (should be always true) 1 overflow happened -> we need to invert the range, but there are two different cases: `Min > Max`, perfect, that's what we expected -> `[Tmin, Max] ∪ [Min, Tmax]` `Max >= Min`, we overflowed and wrapped the whole range -> `[Tmin, Tmax]` 2 overflows happened on one side -> `[Min, Max]` 2 overflows happened on both sides -> `[Tmin, Tmax]` You need to think of this problem in terms of what really happens and not in terms of combating with assertions. vsavchenko: I commented on this part previously, you shouldn't get fixated on `Min > Max` because `Max >=…
		manasAuthorUnsubmitted Done Reply Inline Actions I think I agree with you. This pattern (and the code following this pattern) makes much more sense, and easier to follow. I think I will refactor the patch accordingly. manas: I think I agree with you. This pattern (and the code following this pattern) makes much more…

		// Based on their signedness, compute values for Min and Max, and also store
		// whether those values have overflowed or not.
		if (ResultType.isUnsigned()) {
		Min = llvm::APSInt(LHS.From().uadd_ov(RHS.From(), HasMinOverflowed), true);
		manasAuthorUnsubmitted Done Reply Inline Actions Using `uadd_ov` (and `sadd_ov`), we can get the added value as well as whether overflow occurred or not. A point is that these functions return `APInt` instead of `APSInt`. But when I tried just using: Min = LHS.From().uadd_ov(RHS.From(), HasMinOverflowed); Max = LHS.To().uadd_ov(RHS.From(), HasMaxOverflowed); instead of Min = LHS.From() + RHS.From(); Max = LHS.To() + RHS.To(); just for the added value, then the following tests failed (these tests and all other tests pass when I use the latter method to get Min/Max): Clang :: Analysis/PR3991.m Clang :: Analysis/global-region-invalidation.c Clang :: Analysis/malloc-overflow2.c Clang :: Analysis/out-of-bounds-new.cpp Clang :: Analysis/taint-generic.c I am working on fixing this part. manas: Using `uadd_ov` (and `sadd_ov`), we can get the added value as well as whether overflow…
		vsavchenkoUnsubmitted Done Reply Inline Actions You can easily construct `APSInt` from `APInt` using `APSInt ::APSInt(APInt I, bool isUnsigned)` constructor. vsavchenko: You can easily construct `APSInt` from `APInt` using `APSInt ::APSInt(APInt I, bool…
		manasAuthorUnsubmitted Done Reply Inline Actions Okay. I will try with using `uadd_ov` only then. And check whether those tests pass or not. manas: Okay. I will try with using `uadd_ov` only then. And check whether those tests pass or not.
		vsavchenkoUnsubmitted Done Reply Inline Actions Hmm, why only `uadd_ov`? What about those tests? How do they fail? Try to look at the reasons and not brute-force by trying different solutions blindly. Those tests are your friends, it's much better to get failures right now then getting them later when you land the patch. vsavchenko: Hmm, why only `uadd_ov`? What about those tests? How do they fail? Try to look at the…
		manasAuthorUnsubmitted Done Reply Inline Actions I meant related functions as well. Most of them were failing due to `Error evaluating statements`. Although, I have rebased my tree so I will re-check again. manas: I meant related functions as well. Most of them were failing due to `Error evaluating…
		vsavchenkoUnsubmitted Done Reply Inline Actions `Error evaluating statements` just tells you when we crashed (for the context and fast insight). Below this message is usually the real reason (and the last time I checked, it was assertion `From > To` failure). vsavchenko: `Error evaluating statements` just tells you when we crashed (for the context and fast insight).
		manasAuthorUnsubmitted Done Reply Inline Actions My bad. I took a look and found, it was signed mismatch! manas: My bad. I took a look and found, it was signed mismatch!
		Max = llvm::APSInt(LHS.To().uadd_ov(RHS.To(), HasMaxOverflowed), true);
		} else {
		Min = llvm::APSInt(LHS.From().sadd_ov(RHS.From(), HasMinOverflowed), false);
		Max = llvm::APSInt(LHS.To().sadd_ov(RHS.To(), HasMaxOverflowed), false);
		}

		// If no overflow occured then the range [Min, Max] is correct
		vsavchenkoUnsubmitted Done Reply Inline Actions This clause is exactly the same as the previous one, it is a mistake. And I think we should have a test that could've shown that. Also, since you are checking for overflows for both the beginning and the end, we should have tests where both overflow. vsavchenko: This clause is exactly the same as the previous one, it is a mistake. And I think we should…
		manasAuthorUnsubmitted Done Reply Inline Actions Understood! I will add tests to check each OR part of these conditionals in these cases. manas: Understood! I will add tests to check each OR part of these conditionals in these cases.
		if (!HasMinOverflowed && !HasMaxOverflowed) {
		return {RangeFactory, ValueFactory.getValue(Min),
		ValueFactory.getValue(Max)};
		}
		vsavchenkoUnsubmitted Done Reply Inline Actions What if `LHS.From()` is `Tmin` for signed `T`? What if `T` is unsigned? Does `Tmin - LHS.From()` (aka `0 - LHS.From()`) make sense? vsavchenko: What if `LHS.From()` is `Tmin` for signed `T`? What if `T` is unsigned? Does `Tmin - LHS.From…
		manasAuthorUnsubmitted Done Reply Inline Actions Even if `LHS.From()` is `Tmin` for signed `T` the reasoning would be correct. As, CrucialPoint will be `Zero` and hence RHS will be divided into two parts: BeforeZero (`[RHS.From, Zero]`) and AfterAndIncludingZero (`[Zero, RHS.To]`). And if the BeforeZero subrange (which will be responsible for overflowing, contains `Tmin + Tmax` then we can ensure that the final range will be `[Tmin, Tmax]`. And if it does not, (and assuming there is no overflow on the right side), then `1432:1434` will reason the range to be `[Tmin, Max] U [Min, Tmax]`. For unsigned types, actually we are asking the wrong question because an addition of unsigned integers can never overflow from the left side. So in that case, the code deducing the overflows from left-side will not be applicable. And therefore, `0 - LHS.From()` will never give us anything. I was thinking to checked T before applying these rules. manas: Even if `LHS.From()` is `Tmin` for signed `T` the reasoning would be correct. As, CrucialPoint…

		manasAuthorUnsubmitted Done Reply Inline Actions I changed the logic from using getCrucialPoints (which I mentioned in the e-mail thread) to simple checks to determine overflows using signedness. But the failed assertions again popped up. And I was unable to pin-point the issue here. Can someone help me? Although, I am thinking of revising the above logic by using bitwise methods to detect overflows. manas: I changed the logic from using getCrucialPoints (which I mentioned in the e-mail thread) to…
		vsavchenkoUnsubmitted Done Reply Inline Actions This is actually one place that I won't expect an assertion failure. Can we get a bit more detail on it? Is it again `From > To` (which will indeed be weird) or something different? NOTE: Asking for help (either here or on StackOverflow) is a lot like filing a bug report, try to put as much information as possible, try to structure this information so it is easy to follow. It's also good to tell people what you tried, instead of saying that you tried something and it didn't work. vsavchenko: This is actually one place that I won't expect an assertion failure. Can we get a bit more…
		vsavchenkoUnsubmitted Done Reply Inline Actions OK, I downloaded your patch and ran the debugger. It complains about different bit-width for ranges that the analyzer tries to intersect. What I checked next: what are those ranges, so I took `begin()` from one of them and checked what are `From` and `To` there. Here is one of them: (const llvm::APSInt) $8 = { llvm::APInt = { U = { VAL = 22 pVal = 0x0000000000000016 } BitWidth = 4022311424 } IsUnsigned = true } Woah, this `BitWidth` seems ridiculous! What does it tell us? It definitely didn't get there as part of any reasonable logic, right? So, what can it be instead? Only one answer here - garbage. We have some sort of memory issue with integers that we use as part of our ranges! So, let's see what type of information `Range` class actually stores: class Range { public: Range(const llvm::APSInt &From, const llvm::APSInt &To) : Impl(&From, &To) { assert(From <= To); } ... private: std::pair<const llvm::APSInt , const llvm::APSInt > Impl; }; What we have here is a pair of pointers, and you have: llvm::APSInt Min = LHS.From() + RHS.From(); llvm::APSInt Max = LHS.To() + RHS.To(); llvm::APSInt Tmin = ValueFactory.getMinValue(ResultType); llvm::APSInt Tmax = ValueFactory.getMaxValue(ResultType); These are ALL stack variables. So, pointer to those point to adequate data ONLY while we are inside of our functions. When we call another function, these pointers point into some data from some other function in the call stack, it doesn't point to anything even remotely resembling `llvm::APSInt` and we get `BitWidth = 4022311424`. OK, we figured out the problem, what about a solution? If you look at other implementations, you can notice a couple of interesting things. When you do: llvm::APSInt Tmin = ValueFactory.getMinValue(ResultType); other code does something like: const llvm::APSInt &Tmin = ValueFactory.getMinValue(ResultType); Or when you do: Range Second = Range(Min, Tmax); The sibling code does: return {RangeFactory, ValueFactory.getValue(Min), ValueFactory.getValue(Max)}; It looks like `ValueFactory` is the key here! It actually manages the lifetime of those integers for you and, whenever you ask it about values, it gives you `llvm::APSInt &` (note the reference part) that will have the correct lifetime. I hope this gives you some insight into how you can debug things like this on your own and how you can reason about what you see. Another piece of advice is to look around, other `VisitBinaryOperator` methods have all the information you actually needed. If you don't understand why we need `ValueFactory`, - experiment, ask us! It's bad to just ignore it. vsavchenko: OK, I downloaded your patch and ran the debugger. It complains about different bit-width for…
		manasAuthorUnsubmitted Done Reply Inline Actions Got it! I am working on fixing this. manas: Got it! I am working on fixing this.
		// In case of only one overflow, the two possibilities are:
		// 1. The overflowing value overlaps the other boundary value, in which case,
		// the range is entire range set [Tmin, Tmax]
		// 2. The values do not get overlapped, which results in segmented ranges
		vsavchenkoUnsubmitted Done Reply Inline Actions Why not just `return RangeFactory.add(ResultRangeSet, Second)`? NOTE: variables with integers in their names is a big no-no. vsavchenko: Why not just `return RangeFactory.add(ResultRangeSet, Second)`? NOTE: variables with integers…
		manasAuthorUnsubmitted Done Reply Inline Actions Right! I assumed I have to refactor this part so I used this temporarily. manas: Right! I assumed I have to refactor this part so I used this temporarily.
		// [Tmin, Max] U [Min, Tmax]
		if (HasMinOverflowed ^ HasMaxOverflowed) {
		if (Min > Max) {
		RangeSet Result(RangeFactory, Tmin, ValueFactory.getValue(Max));
		return RangeFactory.add(Result,
		{RangeFactory, ValueFactory.getValue(Min), Tmax});
		}
		return {RangeFactory, Tmin, Tmax};
		}
		vsavchenkoUnsubmitted Done Reply Inline Actions I thought we talked quite a lot that there is nothing bad with overflows and here we have that if ANY overflow happened, we bail out and don't give any result. vsavchenko: I thought we talked quite a lot that there is nothing bad with overflows and here we have that…
		manasAuthorUnsubmitted Done Reply Inline Actions Understood! Should I replace it with code returning EmptySet()? manas: Understood! Should I replace it with code returning EmptySet()?
		vsavchenkoUnsubmitted Done Reply Inline Actions Why `EmptySet()`? `EmptySet()` means that this can never happen, this path is infeasible. Is that the case? Let's say we have: `[INT_MAX - 20, INT_MAX - 10] + [30, 40]` what happens in this case? vsavchenko: Why `EmptySet()`? `EmptySet()` means that this can never happen, this path is infeasible. Is…
		manasAuthorUnsubmitted Done Reply Inline Actions Right! That will not be the case. In this particular case, the range will be `[INT_MIN + 9, INT_MIN + 29]` which is far smaller than `[Tmin, Tmax]`. Also, I think I misunderstood the part of bailing out and not giving any result as returning empty. manas: Right! That will not be the case. In this particular case, the range will be `[INT_MIN + 9…
		vsavchenkoUnsubmitted Done Reply Inline Actions Gotcha! Because of cases like this, you need to re-think that part about `Min > Max` and maybe count the number of overflows on each side? vsavchenko: Gotcha! Because of cases like this, you need to re-think that part about `Min > Max` and maybe…
		manasAuthorUnsubmitted Done Reply Inline Actions That's right. If it is able to deduce how many times overflows are occurring then it can reason about whether it will be MaxRangeSet or a subset of it. Fixing it! manas: That's right. If it is able to deduce how many times overflows are occurring then it can reason…

		// If both boundary values overflow on the same side then rangeset is
		// [Min, Max]
		if (((LHS.From() > Zero && RHS.From() > Zero) &&
		(LHS.To() > Zero && RHS.To() > Zero)) \|\|
		((LHS.From() < Zero && RHS.From() < Zero) &&
		(LHS.To() < Zero && RHS.To() < Zero))) {
		return {RangeFactory, ValueFactory.getValue(Min),
		ValueFactory.getValue(Max)};
		}

		// When both boundary values overflow from different sides then rangeset is
		// [Tmin, Tmax] because the entire rangeset is already covered and overflow
		// from opposite sides ensure already computed points to be reached again.
		return {RangeFactory, Tmin, Tmax};
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Constraint assignment logic		// Constraint assignment logic
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

/// ConstraintAssignorBase is a small utility class that unifies visitor		/// ConstraintAssignorBase is a small utility class that unifies visitor
/// for ranges with a visitor for constraints (rangeset/range/constant).		/// for ranges with a visitor for constraints (rangeset/range/constant).
///		///
/// It is designed to have one derived class, but generally it can have more.		/// It is designed to have one derived class, but generally it can have more.
/// Derived class can control which types we handle by defining methods of the		/// Derived class can control which types we handle by defining methods of the
		vsavchenkoUnsubmitted Done Reply Inline Actions Speaking of unsigned `T`, does it work for unsigned overflows? Do we have tests for that? vsavchenko: Speaking of unsigned `T`, does it work for unsigned overflows? Do we have tests for that?
		manasAuthorUnsubmitted Done Reply Inline Actions This was not going to make into the patch as it does not solve the case mentioned in FIXME(:1437). An idea which I had in the previous post was this: If one of Min/Max overflows while the other doesn't then the resulting range should be [Tmin, Tmax] as Min <= Max even after overflowing. If both of them overflows, then: a. If both overflows on the same side, that is, both wrapped around Tmax, or both wrapped around Tmin, then the range should be [Min, Max] only. b. But if both overflowed in different sides, supposedly Min overflowed on left and Max on right, or vice versa, then the range will be [Tmin, Tmax]. manas: This was not going to make into the patch as it does not solve the case mentioned in FIXME…
/// following form:		/// following form:
///		///
/// bool handle${SYMBOL}To${CONSTRAINT}(const SYMBOL *Sym,		/// bool handle${SYMBOL}To${CONSTRAINT}(const SYMBOL *Sym,
/// CONSTRAINT Constraint);		/// CONSTRAINT Constraint);
///		///
/// where SYMBOL is the type of the symbol (e.g. SymSymExpr, SymbolCast, etc.)		/// where SYMBOL is the type of the symbol (e.g. SymSymExpr, SymbolCast, etc.)
/// CONSTRAINT is the type of constraint (RangeSet/Range/Const)		/// CONSTRAINT is the type of constraint (RangeSet/Range/Const)
/// return value signifies whether we should try other handle methods		/// return value signifies whether we should try other handle methods
▲ Show 20 Lines • Show All 1,414 Lines • Show Last 20 Lines

clang/test/Analysis/constant-folding.c

Show First 20 Lines • Show All 275 Lines • ▼ Show 20 Lines	void testRemainderRules(unsigned int a, unsigned int b, int c, int d) {

// Check that we work correctly when minimal unsigned value from a range is		// Check that we work correctly when minimal unsigned value from a range is
// equal to the signed minimum for the same bit width.		// equal to the signed minimum for the same bit width.
unsigned int x = INT_MIN;		unsigned int x = INT_MIN;
if (a >= x && a <= x + 10) {		if (a >= x && a <= x + 10) {
clang_analyzer_eval((b % a) < x + 10); // expected-warning{{TRUE}}		clang_analyzer_eval((b % a) < x + 10); // expected-warning{{TRUE}}
}		}
}		}

		void testAdditionRules(unsigned int a, unsigned int b, int c, int d) {
		if (a == 0) {
		clang_analyzer_eval((a + 0) == 0); // expected-warning{{TRUE}}
		}

		// Overflows on Tmax side
		vsavchenkoUnsubmitted Done Reply Inline Actions This looks like an overflow on one end (we couldn't have them on both ends with addition and unsigned integers) and wrapping around surpassing the beginning of the range. vsavchenko: This looks like an overflow on one end (we couldn't have them on both ends with addition and…
		if (a >= 5 && a <= UINT_MAX - 5 && b <= 10) {
		clang_analyzer_eval((a + b) >= UINT_MAX >> 1); // expected-warning{{UNKNOWN}}
		clang_analyzer_eval((a + b) <= UINT_MAX >> 1); // expected-warning{{UNKNOWN}}
		}

		xazax.hunUnsubmitted Done Reply Inline Actions Since both `a` and `b` are unsigned values, their sum is also unsigned. We basically check if their sum could be zero. I don't see why it could not in this case (e.g. when `b == UINT_MAX - a + 1`). Do I miss something? xazax.hun: Since both `a` and `b` are unsigned values, their sum is also unsigned. We basically check if…
		manasAuthorUnsubmitted Done Reply Inline Actions I am editing that condition to be `b >= INT_MAX`. manas: I am editing that condition to be `b >= INT_MAX`.
		// Overflows on both ends
		xazax.hunUnsubmitted Done Reply Inline Actions If `a == INT_MAX + 1 && b == 0` then `(a + b) > 0`. Do I miss something? Or is this a bug? xazax.hun: If `a == INT_MAX + 1 && b == 0` then `(a + b) > 0`. Do I miss something? Or is this a bug?
		vsavchenkoUnsubmitted Done Reply Inline Actions We need a test with ranges and unsigned overflow vsavchenko: We need a test with ranges and unsigned overflow
		if (c >= INT_MIN + 5 && c <= INT_MAX - 5 && d >= -10 && d <= 10) {
		clang_analyzer_eval((c + d) <= 0); // expected-warning{{UNKNOWN}}
		clang_analyzer_eval((c + d) >= 0); // expected-warning{{UNKNOWN}}
		vsavchenkoUnsubmitted Done Reply Inline Actions They are already unsigned, we don't need `a >= 0` and `b >= 0` vsavchenko: They are already unsigned, we don't need `a >= 0` and `b >= 0`
		manasAuthorUnsubmitted Done Reply Inline Actions Right! I will remove those constraints. manas: Right! I will remove those constraints.
		}

		// Checks for unsigned operands
		clang_analyzer_eval((a + b) < 0); // expected-warning{{FALSE}}
		clang_analyzer_eval((a + b) <= UINT_MAX); // expected-warning{{TRUE}}

		vsavchenkoUnsubmitted Done Reply Inline Actions I think this is always true. vsavchenko: I think this is always true.
		xazax.hunUnsubmitted Done Reply Inline Actions So is `a >= 0 && b >= 0` unless I miss something. xazax.hun: So is `a >= 0 && b >= 0` unless I miss something.
		manasAuthorUnsubmitted Done Reply Inline Actions @vsavchenko true, I am removing that conditional as it serves no purpose there. @xazax.hun that's true. manas: @vsavchenko true, I am removing that conditional as it serves no purpose there. @xazax.hun…
		if (a == UINT_MAX && b == UINT_MAX) {
		clang_analyzer_eval((a + b) == UINT_MAX - 1); // expected-warning{{TRUE}}
		}

		vsavchenkoUnsubmitted Done Reply Inline Actions Good! vsavchenko: Good!
		// Checks for inclusive ranges for unsigned integers
		if (a <= 10 && b <= 20) {
		manasAuthorUnsubmitted Done Reply Inline Actions When I pull these cases out to a separate function (say testAdditionRules2) in constant-folding.c then algorithm is able to reason about the range correctly, but inside testAdditionRules, it is unable to figure stuff out. Does it have something to do with constraints being propagated which we discussed below? @vsavchenko wrote: I have one note here. The fact that we don't have a testing framework and use this approach of inspecting the actual analysis has an interesting implication. if (a == 10) { // here we split path into 2 paths: one where a == 10, and one where a != 10; // one goes inside of if, one doesn't . . . } if (a >= 5) { // here we split into 3 paths: a == 10, a < 5, and a in [5, 9] and [11, INT_MAX] (aka a >= 5 and a != 10). // 1 path was infeasible: a == 10 and a < 5 // Two of these paths go inside of the if, one doesn't . . . clang_analyzer_eval(a == 10); // it will produce two results: TRUE and FALSE } clang_analyzer_eval(a == 10); // it will produce three results: TRUE, FALSE, and FALSE We don't want to test how path splitting works in these particular tests (they are about solver and constant folding after all), that's why we try to have only one path going through each `clang_analyzer_eval(...)` manas: When I pull these cases out to a separate function (say testAdditionRules2) in constant-folding.
		vsavchenkoUnsubmitted Done Reply Inline Actions It might be that or it might be something different. Just by looking at this example, the previous `if` statement shouldn't add more paths that go inside of this `if`, so it shouldn't be the root cause. Whenever you encounter problems and you want to tell other people, please, provide more detail. Did you notice it when running the test case? What was the output? What did you try? How did you extract it into a separate function? vsavchenko: It might be that or it might be something different. Just by looking at this example, the…
		manasAuthorUnsubmitted Done Reply Inline Actions I put a new test function in `constant-folding.c` as: void testAdditionRulesNew(int c, int d) { if (c < 0 && c != INT_MIN && d < 0) { clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}} } } I tested this specific function as: ./build/bin/clang -cc1 -analyze -analyzer-checker=core,debug.ExprInspection -analyze-function=testAdditionRulesNew constant-folding.c And I got the following output: ../clang/test/Analysis/constant-folding.c:338:5: warning: FALSE [debug.ExprInspection] clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}} which is correct. But when I ran the same test inside `testAdditionRules`, using: ./build/bin/clang -cc1 -analyze -analyzer-checker=core,debug.ExprInspection -analyze-function=testAdditionRules constant-folding.c then I got: ../clang/test/Analysis/constant-folding.c:281:5: warning: FALSE [debug.ExprInspection] clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}} ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../clang/test/Analysis/constant-folding.c:281:5: warning: TRUE [debug.ExprInspection] clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}} ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Here, `c = [INT_MIN + 1, -1]` and `d = [INT_MIN, 0]`, so `c + d = [INT_MIN, -2] U [1, INT_MAX]`. So `c + d == 0` should be false. But in latter case, it is reasoning `c + d == 0` to be `UNKNOWN`. Also, the error arises in `c + d == 0` case only and not in `c + d == -1` case. I mistakenly highlighted that case while commenting. manas: I put a new test function in `constant-folding.c` as: ``` void testAdditionRulesNew(int c, int…
		vsavchenkoUnsubmitted Done Reply Inline Actions Hmm, I don't know what can be the reason. There are three reasonable approaches: Straightforward: use debugger. Find a place when we ask the solver about this assumption and dig in. Print-based: use `clang_analyzer_printState` inside of this if. Among other things it will print you all constraints that we know about, you can check if they match your expectations. Also, if it is indeed because of residual paths, you will get another print from residual path. Reduction: add the whole `testAdditionRules` into `testAdditionRulesNew` and start removing parts of it. At some point problem should disappear (based on your previous observations). That "minimal" reproducer can give you insight into what's going on. These are not mutually exclusive, so you can try them in combination. vsavchenko: Hmm, I don't know what can be the reason. There are three reasonable approaches: 1.
		manasAuthorUnsubmitted Done Reply Inline Actions I tried print the ProgramState as in both the cases like: if (c < 0 && c != INT_MIN && d < 0) { clang_analyzer_printState(); clang_analyzer_eval(...); ... } During `testAdditionRulesNew`, when I added `clang_analyzer_printState` just before doing any `clang_analyzer_eval` calls, I got the following constraints (these are as expected): "constraints": [ { "symbol": "reg_$0<int c>", "range": "{ [-2147483647, -1] }" }, { "symbol": "reg_$1<int d>", "range": "{ [-2147483648, -1] }" } ], In `testAdditionRules`, adding `printState` before any evaluation led to: "constraints": [ { "symbol": "reg_$2<int c>", "range": "{ [-2147483647, -1] }" }, { "symbol": "reg_$3<int d>", "range": "{ [-2147483648, -1] }" }, { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [0, 2147483647] }" } ], "constraints": [ { "symbol": "reg_$2<int c>", "range": "{ [-2147483647, -1] }" }, { "symbol": "reg_$3<int d>", "range": "{ [-2147483648, -1] }" }, { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [-2147483648, -2] }" } ], A point to note is that this ProgramState is before any `clang_analyzer_eval` calls have been made. So, the third constraint, in both sets of constraints (for `c + d`) should not exist there, that is, a. { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [0, 2147483647] }" } b. { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [-2147483648, -2] }" } should not be in the constraints set. And as obvious, the `UNKNOWN` reasoning for `c + d == 0` in `testAdditionRules` is arising from constraint (a) above, which is a wrong reasoning (it ideally should be `[1, INT_MAX]` and not `[0, INT_MAX]`). manas: I tried print the ProgramState as in both the cases like: if (c < 0 && c != INT_MIN && d <…
		manasAuthorUnsubmitted Done Reply Inline Actions @vsavchenko wrote: Reduction: add the whole `testAdditionRules` into `testAdditionRulesNew` and start removing parts of it. At some point problem should disappear (based on your previous observations). That "minimal" reproducer can give you insight into what's going on. Trying this, I was able to deduce that the following snippet is causing `UNKNOWN` triggers: void testAdditionRulesNew(unsigned int a, unsigned int b, int c, int d) { if (c < 0 && d < 0) { clang_analyzer_printState(); clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}} clang_analyzer_printState(); } if (c < 0 && c != INT_MIN && d < 0) { clang_analyzer_printState(); clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}} clang_analyzer_printState(); } } The output warnings are as follows: <snipped-ProgramStates> ../clang/test/Analysis/constant-folding.c:341:5: warning: FALSE [debug.ExprInspection] clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}} ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../clang/test/Analysis/constant-folding.c:341:5: warning: TRUE [debug.ExprInspection] clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}} ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../clang/test/Analysis/constant-folding.c:347:5: warning: FALSE [debug.ExprInspection] clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}} ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ../clang/test/Analysis/constant-folding.c:347:5: warning: TRUE [debug.ExprInspection] clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}} ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 4 warnings generated. The second evaluation is leading to wrong reasoning. I think the constraints are being propagated somehow from first conditional (`c < 0 && d < 0`) to the second conditional (`c < 0 && c != INT_MIN && d < 0`). I have added a pastebin using `clang_analyzer_printState` here: https://pastebin.com/0E8hmTWh manas: @vsavchenko wrote: > 3. Reduction: add the whole `testAdditionRules` into…
		vsavchenkoUnsubmitted Done Reply Inline Actions Ooof, OK. I know what's going on here. `clang_analyzer_eval` actually assumes its condition. So it works as expected when the result is `TRUE` or `FALSE`, but actually splits the path when `UNKNOWN`. This being said, the first conditional doesn't affect you (it is weaker than the second condition), `clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}` does. After that point, you have two states: one where `c + d >= 0` (you called it a.) and the other with `c + d < 0` (b.). Another important point here is located in `SymbolicRangeInferrer::infer(SymbolRef Sym)`. You can see that there are three returns in that function. Your functionality lives in the very last one, namely in `Visit(Sym)`. What it means is that we ALWAYS give preference to existing constraints. And if we don't have any, we start traversing symbolic expressions to get something from that level. This logic is flawed because we can be in a situation like you encountered now. Where we believe that `c + d >= 0`, but if we look at `c` and `d`, we can see that this is actually impossible. It can be solved by always doing `Visit(Sym)` and intersecting results from ALL available sources. It has a simple implementation, but requires additional performance testing. Long story short, that's not your problem. Try to avoid it/rewrite the test. vsavchenko: Ooof, OK. I know what's going on here. `clang_analyzer_eval` actually assumes its…
		clang_analyzer_eval((a + b) >= 0); // expected-warning{{TRUE}}
		xazax.hunUnsubmitted Done Reply Inline Actions I think `UINT_MAX + UINT_MAX` supposed to be positive. Do I miss something here? xazax.hun: I think `UINT_MAX + UINT_MAX` supposed to be positive. Do I miss something here?
		manasAuthorUnsubmitted Done Reply Inline Actions Should not `UINT_MAX + UINT_MAX` wrap around to be negative? I think I misunderstood it. manas: Should not `UINT_MAX + UINT_MAX` wrap around to be negative? I think I misunderstood it.
		manasAuthorUnsubmitted Done Reply Inline Actions Understood. My bad. manas: Understood. My bad.
		vsavchenkoUnsubmitted Done Reply Inline Actions Great, it's good to check negative numbers. I have one note here. The fact that we don't have a testing framework and use this approach of inspecting the actual analysis has an interesting implication. if (a == 10) { // here we split path into 2 paths: one where a == 10, and one where a != 10; // one goes inside of if, one doesn't . . . } if (a >= 5) { // here we split into 3 paths: a == 10, a < 5, and a in [5, 9] and [11, INT_MAX] (aka a >= 5 and a != 10). // 1 path was infeasible: a == 10 and a < 5 // Two of these paths go inside of the if, one doesn't . . . clang_analyzer_eval(a == 10); // it will produce two results: TRUE and FALSE } clang_analyzer_eval(a == 10); // it will produce three results: TRUE, FALSE, and FALSE We don't want to test how path splitting works in these particular tests (they are about solver and constant folding after all), that's why we try to have only one path going through each `clang_analyzer_eval(...)` In this example, you still have residual paths where `c != INT_MIN`, `c == INT_MIN and d != INT_MIN`, and `c == INT_MIN and d == INT_MIN`. vsavchenko: Great, it's good to check negative numbers. I have one note here. The fact that we don't have…
		manasAuthorUnsubmitted Done Reply Inline Actions I should add tests for these paths as well so that these can be checked. For further cases, I will enforce single path evaluation in test cases (which will make it easier to handle here). manas: I should add tests for these paths as well so that these can be checked. For further cases, I…
		vsavchenkoUnsubmitted Done Reply Inline Actions In these tests we should really avoid having multiple feasible paths going through one `clang_analyzer_eval`. We can add `// expected-warning` to it multiple times, but it's harder to understand as a reader of these tests because you need to split all the paths in your mind just like the analyzer does. It can be useful when testing other features of the analyzer, but here it's not essential. And I don't think that you fully understood me. When multiple paths go through `clang_analyzer_eval`, each of them produces a separate "warning". That's why if you want to account for residual paths, you shouldn't add more `if` statements, but add something like: clang_analyzer_eval(a == 10); // expected-warning{{TRUE}} // expected-warning@-1{{TRUE}} // expected-warning@-2{{FALSE}} So, please, instead of writing tests like this, rewrite your test, so that we don't have residual paths. vsavchenko: In these tests we should really avoid having multiple feasible paths going through one…
		manasAuthorUnsubmitted Done Reply Inline Actions Understood. I am changing those for single path evaluation accordingly. manas: Understood. I am changing those for single path evaluation accordingly.
		clang_analyzer_eval((a + b) > 30); // expected-warning{{FALSE}}
		}

		// Checks for negative signed integers
		if (c < 0 && d < 0) {
		vsavchenkoUnsubmitted Done Reply Inline Actions I'm not sure I understand why do you make this assumption. vsavchenko: I'm not sure I understand why do you make this assumption.
		clang_analyzer_eval((c + d) != -1); // expected-warning{{TRUE}}
		}

		if (c < 0 && c != INT_MIN && d < 0) {
		vsavchenkoUnsubmitted Done Reply Inline Actions Here is the same note, `clang_analyzer_eval` doesn't produce `TRUE` only because this condition might be true. vsavchenko: Here is the same note, `clang_analyzer_eval` doesn't produce `TRUE` only because this condition…
		manasAuthorUnsubmitted Done Reply Inline Actions understood. manas: understood.
		clang_analyzer_eval((c + d) == -1); // expected-warning{{FALSE}}
		clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
		clang_analyzer_eval((c + d) <= -2); // expected-warning{{UNKNOWN}}
		clang_analyzer_eval((c + d) >= 1); // expected-warning{{UNKNOWN}}
		}

		if (c == INT_MIN && d == INT_MIN) {
		vsavchenkoUnsubmitted Done Reply Inline Actions If `c == d == 1` this doesn't hold. Did you want to check here that we account for overflows? vsavchenko: If `c == d == 1` this doesn't hold. Did you want to check here that we account for overflows?
		manasAuthorUnsubmitted Done Reply Inline Actions Yes, I was checking for that. I am changing that case. manas: Yes, I was checking for that. I am changing that case.
		clang_analyzer_eval((c + d) == 0); // expected-warning{{TRUE}}
		}

		if (c == INT_MIN && d < 0 && d != INT_MIN) {
		vsavchenkoUnsubmitted Done Reply Inline Actions What about other cases here? I mean specifically when two ranges are have a form `[A, B]` vsavchenko: What about other cases here? I mean specifically when two ranges are have a form `[A, B]`
		manasAuthorUnsubmitted Done Reply Inline Actions I am adding more cases here. manas: I am adding more cases here.
		clang_analyzer_eval((c + d) > 0); // expected-warning{{TRUE}}
		}

		vsavchenkoUnsubmitted Done Reply Inline Actions `clang_analyzer_eval` returns `TRUE` only if we can deduct that this expression on a particular path is always true. Here it's not going to hold. vsavchenko: `clang_analyzer_eval` returns `TRUE` only if we can deduct that this expression on a particular…
		vsavchenkoUnsubmitted Done Reply Inline Actions How can these two statements be TRUE at the same time? vsavchenko: How can these two statements be TRUE at the same time?
		manasAuthorUnsubmitted Done Reply Inline Actions Right, my bad. They both should be `UNKNOWN`. As, `c` and `d` are signed 32-bit positive integers, hence their respective values will be in `[1, INT_MAX]`. When `c == d == 1` then `c + d == 2`, and when `c == d == INT_MAX` then `c + d == -2` (overflow). Only possible values which `c + d` cannot attain are `{-1, 0, 1}`. As a simple proof: Dividing this range into R1 = [1, INT_MAX/2] and R2 = [(INT_MAX/2) + 1, INT_MAX] Now, `c + d` will have 4 combination of ranges to be solved: `R1 + R1` : this will never overflow as the maximum value it can attain will be `INT_MAX - 1` (when `c == d == INT_MAX/2`) while in `R1 + R2`, `R2 + R1`, and `R2 + R2` expressions will overflow, leading value of `c + d` from `INT_MIN` to `-2`, except for case `INT_MAX/2 + (INT_MAX/2 + 1)`, where `c + d == INT_MAX`. Hence, only possible values which can never be attained by `c + d` will be `{-1, 0, 1}`. So, the range for our purposes will be `[INT_MIN, -2] U [2, INT_MAX]`. I think I should write tests for `c + d != {-1, 0, 1}` which will make more sense here. manas: Right, my bad. They both should be `UNKNOWN`. As, `c` and `d` are signed 32-bit positive…
		vsavchenkoUnsubmitted Done Reply Inline Actions I see, that's a good test! vsavchenko: I see, that's a good test!
		if (c < 0 && c >= -20 && d < 0 && d >= -40) {
		clang_analyzer_eval((c + d) < -1); // expected-warning{{TRUE}}
		clang_analyzer_eval((c + d) >= -60); // expected-warning{{TRUE}}
		}

		// Checks for integers with different sign bits
		if (c < 0 && d > 0) {
		if (c >= -20 && d <= 10) {
		clang_analyzer_eval((c + d) > -20); // expected-warning{{TRUE}}
		clang_analyzer_eval((c + d) < 10); // expected-warning{{TRUE}}
		}
		}

		// Checks for overlapping signed integers ranges
		if (c >= -20 && c <= 20 && d >= -10 && d <= 10) {
		clang_analyzer_eval((c + d) >= -30); // expected-warning{{TRUE}}
		clang_analyzer_eval((c + d) <= 30); // expected-warning{{TRUE}}
		}

		// Checks for positive signed integers
		if (c > 0 && d > 0) {
		clang_analyzer_eval((c + d) == 1); // expected-warning{{FALSE}}
		clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
		clang_analyzer_eval((c + d) == -1); // expected-warning{{FALSE}}
		}
		vsavchenkoUnsubmitted Done Reply Inline Actions I don't see the cases where we overflow on both ends and the case where we overflow on one end, but `Min > Max`. vsavchenko: I don't see the cases where we overflow on both ends and the case where we overflow on one end…
		manasAuthorUnsubmitted Done Reply Inline Actions My bad. I thought I added both overflows one. I will add them now. manas: My bad. I thought I added both overflows one. I will add them now.
		vsavchenkoUnsubmitted Done Reply Inline Actions This comment is marked as "Done", but I still don't these cases. vsavchenko: This comment is marked as "Done", but I still don't these cases.
		manasAuthorUnsubmitted Done Reply Inline Actions Amended. From line 260 and later. manas: Amended. From line 260 and later.

		// Check when Max overflows from positive-side
		if (c >= 10 && d >= 0 && d <= 10) {
		clang_analyzer_eval((c + d) == INT_MIN + 10); // expected-warning{{FALSE}}
		clang_analyzer_eval((c + d) == -1); // expected-warning{{FALSE}}
		}

		// Checks when Min overflows from negative side
		if (c <= 10 && d >= -10 && d <= 0) {
		clang_analyzer_eval((c + d) == 11); // expected-warning{{FALSE}}
		clang_analyzer_eval((c + d) == INT_MAX - 10); // expected-warning{{FALSE}}
		}

		// Checks producing overflowing range with different signs
		int HALF_INT_MAX = INT_MAX / 2;
		if (c >= HALF_INT_MAX - 10 && c <= HALF_INT_MAX + 10 &&
		d >= HALF_INT_MAX - 10 && d <= HALF_INT_MAX + 10) {
		// The resulting range for (c + d) will be:
		// [INT_MIN, INT_MIN + 18] U [INT_MAX - 21, INT_MAX]
		clang_analyzer_eval((c + d) <= INT_MIN + 18); // expected-warning{{UNKNOWN}}
		clang_analyzer_eval((c + d) >= INT_MAX - 21); // expected-warning{{UNKNOWN}}
		clang_analyzer_eval((c + d) == INT_MIN + 19); // expected-warning{{FALSE}}
		clang_analyzer_eval((c + d) == INT_MAX - 22); // expected-warning{{FALSE}}
		}
		}

This is an archive of the discontinued LLVM Phabricator instance.

[WIP][analyzer] Introduce range-based reasoning for addition operatorNeeds ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 363742

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp

clang/test/Analysis/constant-folding.c

[WIP][analyzer] Introduce range-based reasoning for addition operator
Needs ReviewPublic