This is an archive of the discontinued LLVM Phabricator instance.

[WIP][analyzer] Introduce range-based reasoning for addition operator
Needs ReviewPublic

Authored by manas on Jun 1 2021, 1:38 AM.

Details

Summary

Add logic to reason about RangeSet for BO_Add.

Diff Detail

Event Timeline

manas created this revision.Jun 1 2021, 1:38 AM
manas requested review of this revision.Jun 1 2021, 1:38 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 1 2021, 1:38 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Hey, great start!
I added my comments inline and other mentors as reviewers.

clang/test/Analysis/constant-folding.c
305

I think this is always true.

317

I'm not sure I understand why do you make this assumption.

320–321

Here is the same note, clang_analyzer_eval doesn't produce TRUE only because this condition might be true.

328

If c == d == 1 this doesn't hold. Did you want to check here that we account for overflows?

332

What about other cases here? I mean specifically when two ranges are have a form [A, B]

335

clang_analyzer_eval returns TRUE only if we can deduct that this expression on a particular path is always true. Here it's not going to hold.

xazax.hun added inline comments.Jun 1 2021, 8:45 PM
clang/test/Analysis/constant-folding.c
295

Since both a and b are unsigned values, their sum is also unsigned. We basically check if their sum could be zero. I don't see why it could not in this case (e.g. when b == UINT_MAX - a + 1). Do I miss something?

296

If a == INT_MAX + 1 && b == 0 then (a + b) > 0. Do I miss something? Or is this a bug?

305

So is a >= 0 && b >= 0 unless I miss something.

312

I think UINT_MAX + UINT_MAX supposed to be positive. Do I miss something here?

NoQ added a comment.Jun 1 2021, 9:53 PM

Our pre-commit testing shows that the new tests fail, eg.:

https://buildkite.com/llvm-project/premerge-checks/builds/41391#b557c86c-0587-4aee-a06b-8a4de6d771c4

Failed Tests (1):
  Clang :: Analysis/constant-folding.c

Did you figure out how to *run* the tests locally? Tests for the static analyzer are in the check-clang-analysis target.

manas added a comment.Jun 1 2021, 10:28 PM

@NoQ I figured out the tests but while testing against Z3, I mixed up constraints. I am changing those.

clang/test/Analysis/constant-folding.c
295

I am editing that condition to be b >= INT_MAX.

305

@vsavchenko true, I am removing that conditional as it serves no purpose there.

@xazax.hun that's true.

312

Should not UINT_MAX + UINT_MAX wrap around to be negative? I think I misunderstood it.

320–321

understood.

328

Yes, I was checking for that. I am changing that case.

332

I am adding more cases here.

manas added inline comments.Jun 1 2021, 11:12 PM
clang/test/Analysis/constant-folding.c
312

Understood. My bad.

Our pre-commit testing shows that the new tests fail, eg.:

https://buildkite.com/llvm-project/premerge-checks/builds/41391#b557c86c-0587-4aee-a06b-8a4de6d771c4

Failed Tests (1):
  Clang :: Analysis/constant-folding.c

Did you figure out how to *run* the tests locally? Tests for the static analyzer are in the check-clang-analysis target.

They are actually supposed to fail, we first decide on the scope of what @manas will support and implement the actual feature later.

manas updated this revision to Diff 349225.Jun 2 2021, 4:08 AM

Fixed test cases expecting wrong assertions and added few more test cases.

manas marked 10 inline comments as done.Jun 2 2021, 9:23 PM

I also would like to see tests where the ranges are not going all the way to either INT_MIN or INT_MAX (if we talk about int), but overflow still might happen, and cases where overflow might happen, but we still can identify the overflowing results precisely (e.g. the result is `[INT_MIN, INT_MIN + 10] and [INT_MAX - 5, INT_MAX])

clang/test/Analysis/constant-folding.c
309

Good!

312

Great, it's good to check negative numbers.
I have one note here. The fact that we don't have a testing framework and use this approach of inspecting the actual analysis has an interesting implication.

if (a == 10) { // here we split path into 2 paths: one where a == 10, and one where a != 10;
               // one goes inside of if, one doesn't
  . . .
}
if (a >= 5) { // here we split into 3 paths: a == 10, a < 5, and a in [5, 9] and [11, INT_MAX] (aka a >= 5 and a != 10).
              // 1 path was infeasible: a == 10 and a < 5
              // Two of these paths go inside of the if, one doesn't
  . . .
  clang_analyzer_eval(a == 10); // it will produce two results: TRUE and FALSE
}
clang_analyzer_eval(a == 10); // it will produce three results: TRUE, FALSE, and FALSE

We don't want to test how path splitting works in these particular tests (they are about solver and constant folding after all), that's why we try to have only one path going through each clang_analyzer_eval(...)

In this example, you still have residual paths where c != INT_MIN, c == INT_MIN and d != INT_MIN, and c == INT_MIN and d == INT_MIN.

334–335

How can these two statements be TRUE at the same time?

manas added a comment.Jun 3 2021, 7:23 PM

I also would like to see tests where the ranges are not going all the way to either INT_MIN or INT_MAX (if we talk about int), but overflow still might happen, and cases where overflow might happen, but we still can identify the overflowing results precisely (e.g. the result is [INT_MIN, INT_MIN + 10] and [INT_MAX - 5, INT_MAX])

If I understood correctly, does a case like: c, d in [INT_MAX/2 - 10, INT_MAX/2 + 10] works? It will produce an overflowing range of [INT_MIN, INT_MIN + 18] U [INT_MAX - 21, INT_MAX]. I will add that to the test-set, if that is so.

For now, I have added tests for residual paths and negations of particular values, and edited buggy tests.

clang/test/Analysis/constant-folding.c
312

I should add tests for these paths as well so that these can be checked. For further cases, I will enforce single path evaluation in test cases (which will make it easier to handle here).

334–335

Right, my bad. They both should be UNKNOWN.

As, c and d are signed 32-bit positive integers, hence their respective values will be in [1, INT_MAX].

When c == d == 1 then c + d == 2, and when c == d == INT_MAX then c + d == -2 (overflow). Only possible values which c + d cannot attain are {-1, 0, 1}.

As a simple proof:

Dividing this range into

R1 = [1, INT_MAX/2] and R2 = [(INT_MAX/2) + 1, INT_MAX]

Now, c + d will have 4 combination of ranges to be solved:

  • R1 + R1 : this will never overflow as the maximum value it can attain will be INT_MAX - 1 (when c == d == INT_MAX/2)
  • while in R1 + R2, R2 + R1, and R2 + R2 expressions will overflow, leading value of c + d from INT_MIN to -2, except for case INT_MAX/2 + (INT_MAX/2 + 1), where c + d == INT_MAX.

Hence, only possible values which can never be attained by c + d will be {-1, 0, 1}. So, the range for our purposes will be [INT_MIN, -2] U [2, INT_MAX].

I think I should write tests for c + d != {-1, 0, 1} which will make more sense here.

manas updated this revision to Diff 349749.Jun 3 2021, 7:30 PM

Added tests for residual paths and negation of certain values, and fixed expected warnings for UNKNOWN cases.

vsavchenko added inline comments.Jun 4 2021, 1:18 AM
clang/test/Analysis/constant-folding.c
312

In these tests we should really avoid having multiple feasible paths going through one clang_analyzer_eval. We can add // expected-warning to it multiple times, but it's harder to understand as a reader of these tests because you need to split all the paths in your mind just like the analyzer does. It can be useful when testing other features of the analyzer, but here it's not essential.

And I don't think that you fully understood me. When multiple paths go through clang_analyzer_eval, each of them produces a separate "warning".
That's why if you want to account for residual paths, you shouldn't add more if statements, but add something like:

clang_analyzer_eval(a == 10); // expected-warning{{TRUE}}
                              // expected-warning@-1{{TRUE}}
                              // expected-warning@-2{{FALSE}}

So, please, instead of writing tests like this, rewrite your test, so that we don't have residual paths.

334–335

I see, that's a good test!

I also would like to see tests where the ranges are not going all the way to either INT_MIN or INT_MAX (if we talk about int), but overflow still might happen, and cases where overflow might happen, but we still can identify the overflowing results precisely (e.g. the result is [INT_MIN, INT_MIN + 10] and [INT_MAX - 5, INT_MAX])

If I understood correctly, does a case like: c, d in [INT_MAX/2 - 10, INT_MAX/2 + 10] works? It will produce an overflowing range of [INT_MIN, INT_MIN + 18] U [INT_MAX - 21, INT_MAX]. I will add that to the test-set, if that is so.

Yes, exactly!

manas added inline comments.Jun 4 2021, 2:30 AM
clang/test/Analysis/constant-folding.c
312

Understood. I am changing those for single path evaluation accordingly.

manas updated this revision to Diff 349796.Jun 4 2021, 2:30 AM

Fix for cases involving residual paths and add case for overflowing range near extremum of a type

xazax.hun added a comment.EditedJun 4 2021, 11:38 AM

I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

For example:

(declare-const a ( _ BitVec 32 ))
(declare-const b ( _ BitVec 32 ))

; Assume 0 <= a, b, <= 5
(assert (bvsge a #x00000000))
(assert (bvsge b #x00000000))
(assert (bvsle a #x00000005))
(assert (bvsle b #x00000005))

; Check if 0 <= a + b  <= 10
(assert (not (and (bvsle (bvadd a b) #x0000000A) (bvsge (bvadd a b) #x00000000))))

(check-sat)

Of course, the example above is obvious, but with overflows and other corner cases it is great to have a tool doublecheck our assumptions, and share how to reproduce the results.

manas added a comment.Jun 4 2021, 3:23 PM

I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

We are looking forward to design a unit-test framework for the solver which can compact the test cases and make them much more manageable (unlike constant-folding.c). Perhaps, we can incorporate the Z3 proves in that framework, corresponding to test cases.

I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

For example:

(declare-const a ( _ BitVec 32 ))
(declare-const b ( _ BitVec 32 ))

; Assume 0 <= a, b, <= 5
(assert (bvsge a #x00000000))
(assert (bvsge b #x00000000))
(assert (bvsle a #x00000005))
(assert (bvsle b #x00000005))

; Check if 0 <= a + b  <= 10
(assert (not (and (bvsle (bvadd a b) #x0000000A) (bvsge (bvadd a b) #x00000000))))

(check-sat)

Of course, the example above is obvious, but with overflows and other corner cases it is great to have a tool doublecheck our assumptions, and share how to reproduce the results.

I usually add things like this not about the tests, but about the implementation. I think it provides a bit more generality. But usually I do it as a comment to the patchset, maybe it is also good to put it in the code as well.

I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

We are looking forward to design a unit-test framework for the solver which can compact the test cases and make them much more manageable (unlike constant-folding.c). Perhaps, we can incorporate the Z3 proves in that framework, corresponding to test cases.

Hmm, so you mean we can check if the analyzer was compiled with Z3 and if so, verify the same things by it?

manas added a comment.EditedJun 5 2021, 7:02 AM

I was wondering, if we could try something new with the tests. To increase our confidence that the expected behavior is correct, how about including a Z3 proof with each of the test cases?

We are looking forward to design a unit-test framework for the solver which can compact the test cases and make them much more manageable (unlike constant-folding.c). Perhaps, we can incorporate the Z3 proves in that framework, corresponding to test cases.

Hmm, so you mean we can check if the analyzer was compiled with Z3 and if so, verify the same things by it?

Yeah in some sense. But I think that having proof for every test case may become redundant for certain cases.

For e.g., consider two test cases for addition and subtraction operator:

  1. c == [0, 10] and d == [-10, 0] will result in (c + d) == [-10, 10]
  2. c == d == [0, 10] will result in (c - d) == [-10, 10]

But the first test case can be modeled as c - (- d)equivalently c - D, that is,

  • usage of subtraction binary operator : (c - D), and
  • symmetrical inversion of range around origin (- d) for symbol d. This will shift the range from [-10, 0] to [0, 10].

Considering having proof for every test case will make the proof for test-case 1 kind of redundant in comparison to test-case 2.

So, I think we should go with @vsavchenko 's method of adding Z3 proof with the implementation (in code), instead of test cases themselves.

manas updated this revision to Diff 352283.Jun 15 2021, 4:05 PM
manas edited the summary of this revision. (Show Details)
manas added a comment.Jun 15 2021, 4:05 PM

Add logic for computing rangeset for an expression containing BO_Add.

manas added inline comments.Jun 15 2021, 4:09 PM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1413–1414

I changed the logic from using getCrucialPoints (which I mentioned in the e-mail thread) to simple checks to determine overflows using signedness. But the failed assertions again popped up. And I was unable to pin-point the issue here. Can someone help me?

Although, I am thinking of revising the above logic by using bitwise methods to detect overflows.

vsavchenko added inline comments.Jun 16 2021, 1:29 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1409

This clause is exactly the same as the previous one, it is a mistake.
And I think we should have a test that could've shown that.
Also, since you are checking for overflows for both the beginning and the end, we should have tests where both overflow.

1413–1414

This is actually one place that I won't expect an assertion failure.
Can we get a bit more detail on it? Is it again From > To (which will indeed be weird) or something different?

NOTE: Asking for help (either here or on StackOverflow) is a lot like filing a bug report, try to put as much information as possible, try to structure this information so it is easy to follow. It's also good to tell people what you tried, instead of saying that you tried something and it didn't work.
1421–1427

I thought we talked quite a lot that there is nothing bad with overflows and here we have that if ANY overflow happened, we bail out and don't give any result.

vsavchenko added inline comments.Jun 16 2021, 2:36 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1413–1414

OK, I downloaded your patch and ran the debugger.

It complains about different bit-width for ranges that the analyzer tries to intersect. What I checked next: what are those ranges, so I took begin() from one of them and checked what are From and To there.
Here is one of them:

(const llvm::APSInt) $8 = {
  llvm::APInt = {
    U = {
      VAL = 22
      pVal = 0x0000000000000016
    }
    BitWidth = 4022311424
  }
  IsUnsigned = true
}

Woah, this BitWidth seems ridiculous! What does it tell us? It definitely didn't get there as part of any reasonable logic, right? So, what can it be instead? Only one answer here - garbage. We have some sort of memory issue with integers that we use as part of our ranges! So, let's see what type of information Range class actually stores:

class Range {
public:
  Range(const llvm::APSInt &From, const llvm::APSInt &To) : Impl(&From, &To) {
    assert(From <= To);
  }

...

private:
  std::pair<const llvm::APSInt *, const llvm::APSInt *> Impl;
};

What we have here is a pair of pointers, and you have:

llvm::APSInt Min = LHS.From() + RHS.From();
llvm::APSInt Max = LHS.To() + RHS.To();
llvm::APSInt Tmin = ValueFactory.getMinValue(ResultType);
llvm::APSInt Tmax = ValueFactory.getMaxValue(ResultType);

These are ALL stack variables. So, pointer to those point to adequate data ONLY while we are inside of our functions. When we call another function, these pointers point into some data from some other function in the call stack, it doesn't point to anything even remotely resembling llvm::APSInt and we get BitWidth = 4022311424.

OK, we figured out the problem, what about a solution? If you look at other implementations, you can notice a couple of interesting things.
When you do:

llvm::APSInt Tmin = ValueFactory.getMinValue(ResultType);

other code does something like:

const llvm::APSInt &Tmin = ValueFactory.getMinValue(ResultType);

Or when you do:

Range Second = Range(Min, Tmax);

The sibling code does:

return {RangeFactory, ValueFactory.getValue(Min), ValueFactory.getValue(Max)};

It looks like ValueFactory is the key here! It actually manages the lifetime of those integers for you and, whenever you ask it about values, it gives you llvm::APSInt & (note the reference part) that will have the correct lifetime.

I hope this gives you some insight into how you can debug things like this on your own and how you can reason about what you see.

Another piece of advice is to look around, other VisitBinaryOperator methods have all the information you actually needed. If you don't understand why we need ValueFactory, - experiment, ask us! It's bad to just ignore it.

1417–1418

Why not just return RangeFactory.add(ResultRangeSet, Second)?

NOTE: variables with integers in their names is a big no-no.
manas added inline comments.Jun 16 2021, 4:31 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1409

Understood! I will add tests to check each OR part of these conditionals in these cases.

1413–1414

Got it! I am working on fixing this.

1417–1418

Right! I assumed I have to refactor this part so I used this temporarily.

1421–1427

Understood! Should I replace it with code returning EmptySet()?

vsavchenko added inline comments.Jun 16 2021, 5:39 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1421–1427

Why EmptySet()? EmptySet() means that this can never happen, this path is infeasible. Is that the case?
Let's say we have: [INT_MAX - 20, INT_MAX - 10] + [30, 40] what happens in this case?

manas added inline comments.Jun 16 2021, 6:16 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1421–1427

Right! That will not be the case.

In this particular case, the range will be [INT_MIN + 9, INT_MIN + 29] which is far smaller than [Tmin, Tmax].

Also, I think I misunderstood the part of bailing out and not giving any result as returning empty.

vsavchenko added inline comments.Jun 16 2021, 6:20 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1421–1427

Gotcha!

Because of cases like this, you need to re-think that part about Min > Max and maybe count the number of overflows on each side?

manas added inline comments.Jun 16 2021, 6:36 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1421–1427

That's right. If it is able to deduce how many times overflows are occurring then it can reason about whether it will be MaxRangeSet or a subset of it. Fixing it!

manas updated this revision to Diff 353168.Jun 18 2021, 11:14 PM

Reason about cases where Min > Max

The diff fixes all invalid assertion issues and also reasons about the cases where Min > Max.

One thing which is stuck for me is the case where Min <= Max but it overflows. I could reason about that in this way:

  1. If one of Min/Max overflows while the other doesn't then the resulting range should be [Tmin, Tmax] as Min <= Max even after overflowing.
  2. If both of them overflows, then: a. If both overflows on the same side, that is, both wrapped around Tmax, or both wrapped around Tmin, then the range should be [Min, Max] only. b. But if both overflowed in different sides, supposedly Min overflowed on left and Max on right, or vice versa, then the range will be [Tmin, Tmax].

Will it work?

clang/test/Analysis/constant-folding.c
310–311

When I pull these cases out to a separate function (say testAdditionRules2) in constant-folding.c then algorithm is able to reason about the range correctly, but inside testAdditionRules, it is unable to figure stuff out. Does it have something to do with constraints being propagated which we discussed below?

@vsavchenko wrote:

I have one note here. The fact that we don't have a testing framework and use this approach of inspecting the actual analysis has an interesting implication.

if (a == 10) { // here we split path into 2 paths: one where a == 10, and one where a != 10;
               // one goes inside of if, one doesn't
  . . .
}
if (a >= 5) { // here we split into 3 paths: a == 10, a < 5, and a in [5, 9] and [11, INT_MAX] (aka a >= 5 and a != 10).
              // 1 path was infeasible: a == 10 and a < 5
              // Two of these paths go inside of the if, one doesn't
  . . .
  clang_analyzer_eval(a == 10); // it will produce two results: TRUE and FALSE
}
clang_analyzer_eval(a == 10); // it will produce three results: TRUE, FALSE, and FALSE

We don't want to test how path splitting works in these particular tests (they are about solver and constant folding after all), that's why we try to have only one path going through each clang_analyzer_eval(...)

manas marked 11 inline comments as done.Jun 20 2021, 10:08 PM
vsavchenko added inline comments.Jun 21 2021, 1:22 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1397

I commented on this part previously, you shouldn't get fixated on Min > Max because Max >= Min doesn't imply that overflow didn't happen (less important) and that the range [Min, Max] is correct (more important). Recall one of the examples that I had in that email.

There is an interesting pattern about results that we can use:

  • no overflows happened -> [Min, Max] (should be always true)
  • 1 overflow happened -> we need to invert the range, but there are two different cases:
    • Min > Max, perfect, that's what we expected -> [Tmin, Max] ∪ [Min, Tmax]
    • Max >= Min, we overflowed and wrapped the whole range -> [Tmin, Tmax]
  • 2 overflows happened on one side -> [Min, Max]
  • 2 overflows happened on both sides -> [Tmin, Tmax]

You need to think of this problem in terms of what really happens and not in terms of combating with assertions.

1413

What if LHS.From() is Tmin for signed T?
What if T is unsigned? Does Tmin - LHS.From() (aka 0 - LHS.From()) make sense?

1450–1453

Speaking of unsigned T, does it work for unsigned overflows? Do we have tests for that?

clang/test/Analysis/constant-folding.c
310–311

It might be that or it might be something different. Just by looking at this example, the previous if statement shouldn't add more paths that go inside of this if, so it shouldn't be the root cause.
Whenever you encounter problems and you want to tell other people, please, provide more detail. Did you notice it when running the test case? What was the output? What did you try? How did you extract it into a separate function?

manas added inline comments.Jun 21 2021, 1:57 AM
clang/test/Analysis/constant-folding.c
310–311

I put a new test function in constant-folding.c as:

void testAdditionRulesNew(int c, int d) {
  if (c < 0 && c != INT_MIN && d < 0) {
    clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
  }
}

I tested this specific function as:

./build/bin/clang -cc1 -analyze -analyzer-checker=core,debug.ExprInspection -analyze-function=testAdditionRulesNew constant-folding.c

And I got the following output:

../clang/test/Analysis/constant-folding.c:338:5: warning: FALSE [debug.ExprInspection]
  clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}

which is correct.

But when I ran the same test inside testAdditionRules, using:

./build/bin/clang -cc1 -analyze -analyzer-checker=core,debug.ExprInspection -analyze-function=testAdditionRules constant-folding.c

then I got:

../clang/test/Analysis/constant-folding.c:281:5: warning: FALSE [debug.ExprInspection]
  clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}                                                                                            
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                                          
../clang/test/Analysis/constant-folding.c:281:5: warning: TRUE [debug.ExprInspection]
  clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}                                                                                            
  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Here, c = [INT_MIN + 1, -1] and d = [INT_MIN, 0], so c + d = [INT_MIN, -2] U [1, INT_MAX]. So c + d == 0 should be false. But in latter case, it is reasoning c + d == 0 to be UNKNOWN.

Also, the error arises in c + d == 0 case only and not in c + d == -1 case. I mistakenly highlighted that case while commenting.

vsavchenko added inline comments.Jun 21 2021, 4:24 AM
clang/test/Analysis/constant-folding.c
310–311

Hmm, I don't know what can be the reason.

There are three reasonable approaches:

  1. Straightforward: use debugger. Find a place when we ask the solver about this assumption and dig in.
  2. Print-based: use clang_analyzer_printState inside of this if. Among other things it will print you all constraints that we know about, you can check if they match your expectations. Also, if it is indeed because of residual paths, you will get another print from residual path.
  3. Reduction: add the whole testAdditionRules into testAdditionRulesNew and start removing parts of it. At some point problem should disappear (based on your previous observations). That "minimal" reproducer can give you insight into what's going on.

These are not mutually exclusive, so you can try them in combination.

manas added inline comments.Jun 21 2021, 6:41 PM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1413

Even if LHS.From() is Tmin for signed T the reasoning would be correct. As, CrucialPoint will be Zero and hence RHS will be divided into two parts: BeforeZero ([RHS.From, Zero]) and AfterAndIncludingZero ([Zero, RHS.To]).

And if the BeforeZero subrange (which will be responsible for overflowing, contains Tmin + Tmax then we can ensure that the final range will be [Tmin, Tmax]. And if it does not, (and assuming there is no overflow on the right side), then 1432:1434 will reason the range to be [Tmin, Max] U [Min, Tmax].

For unsigned types, actually we are asking the wrong question because an addition of unsigned integers can never overflow from the left side. So in that case, the code deducing the overflows from left-side will not be applicable. And therefore, 0 - LHS.From() will never give us anything. I was thinking to checked T before applying these rules.

1450–1453

This was not going to make into the patch as it does not solve the case mentioned in FIXME(:1437).

An idea which I had in the previous post was this:

  1. If one of Min/Max overflows while the other doesn't then the resulting range should be [Tmin, Tmax] as Min <= Max even after overflowing.
  2. If both of them overflows, then: a. If both overflows on the same side, that is, both wrapped around Tmax, or both wrapped around Tmin, then the range should be [Min, Max] only. b. But if both overflowed in different sides, supposedly Min overflowed on left and Max on right, or vice versa, then the range will be [Tmin, Tmax].
manas added inline comments.Jun 21 2021, 6:51 PM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1397

I think I agree with you. This pattern (and the code following this pattern) makes much more sense, and easier to follow. I think I will refactor the patch accordingly.

manas added inline comments.Jun 21 2021, 9:17 PM
clang/test/Analysis/constant-folding.c
310–311

I tried print the ProgramState as in both the cases like:

if (c < 0 && c != INT_MIN && d < 0) {
  clang_analyzer_printState();
  clang_analyzer_eval(...);
  ...
}
  1. During testAdditionRulesNew, when I added clang_analyzer_printState just before doing any clang_analyzer_eval calls, I got the following constraints (these are as expected):
"constraints": [                                                                                                                                             
  { "symbol": "reg_$0<int c>", "range": "{ [-2147483647, -1] }" },                                                                                           
  { "symbol": "reg_$1<int d>", "range": "{ [-2147483648, -1] }" }                                                                                            
],
  1. In testAdditionRules, adding printState before any evaluation led to:
"constraints": [                                                                                                                                             
  { "symbol": "reg_$2<int c>", "range": "{ [-2147483647, -1] }" },                                                                                           
  { "symbol": "reg_$3<int d>", "range": "{ [-2147483648, -1] }" },                                                                                           
  { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [0, 2147483647] }" }                                                                          
],

"constraints": [
  { "symbol": "reg_$2<int c>", "range": "{ [-2147483647, -1] }" },
  { "symbol": "reg_$3<int d>", "range": "{ [-2147483648, -1] }" },
  { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [-2147483648, -2] }" }
],

A point to note is that this ProgramState is before any clang_analyzer_eval calls have been made. So, the third constraint, in both sets of constraints (for c + d) should not exist there, that is,

a. { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [0, 2147483647] }" }
b. { "symbol": "(reg_$2<int c>) + (reg_$3<int d>)", "range": "{ [-2147483648, -2] }" }

should not be in the constraints set.

And as obvious, the UNKNOWN reasoning for c + d == 0 in testAdditionRules is arising from constraint (a) above, which is a wrong reasoning (it ideally should be [1, INT_MAX] and not [0, INT_MAX]).

manas added inline comments.Jun 21 2021, 11:17 PM
clang/test/Analysis/constant-folding.c
310–311

@vsavchenko wrote:

  1. Reduction: add the whole testAdditionRules into testAdditionRulesNew and start removing parts of it. At some point problem should disappear (based on your previous observations). That "minimal" reproducer can give you insight into what's going on.

Trying this, I was able to deduce that the following snippet is causing UNKNOWN triggers:

void testAdditionRulesNew(unsigned int a, unsigned int b, int c, int d) {
  if (c < 0 && d < 0) {
    clang_analyzer_printState();
    clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}
    clang_analyzer_printState();
  }

  if (c < 0 && c != INT_MIN && d < 0) {
    clang_analyzer_printState();
    clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
    clang_analyzer_printState();
  }
}

The output warnings are as follows:

<snipped-ProgramStates>
../clang/test/Analysis/constant-folding.c:341:5: warning: FALSE [debug.ExprInspection]
    clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../clang/test/Analysis/constant-folding.c:341:5: warning: TRUE [debug.ExprInspection]
    clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}}
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../clang/test/Analysis/constant-folding.c:347:5: warning: FALSE [debug.ExprInspection]
    clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../clang/test/Analysis/constant-folding.c:347:5: warning: TRUE [debug.ExprInspection]
    clang_analyzer_eval((c + d) == 0); // expected-warning{{FALSE}}
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 warnings generated.

The second evaluation is leading to wrong reasoning. I think the constraints are being propagated somehow from first conditional (c < 0 && d < 0) to the second conditional (c < 0 && c != INT_MIN && d < 0).

I have added a pastebin using clang_analyzer_printState here: https://pastebin.com/0E8hmTWh

vsavchenko added inline comments.Jun 22 2021, 2:38 AM
clang/test/Analysis/constant-folding.c
310–311

Ooof, OK. I know what's going on here.

clang_analyzer_eval actually assumes its condition. So it works as expected when the result is TRUE or FALSE, but actually splits the path when UNKNOWN.
This being said, the first conditional doesn't affect you (it is weaker than the second condition), clang_analyzer_eval((c + d) < 0); // expected-warning{{UNKNOWN}} does.
After that point, you have two states: one where c + d >= 0 (you called it a.) and the other with c + d < 0 (b.).

Another important point here is located in SymbolicRangeInferrer::infer(SymbolRef Sym). You can see that there are three returns in that function. Your functionality lives in the very last one, namely in Visit(Sym). What it means is that we ALWAYS give preference to existing constraints. And if we don't have any, we start traversing symbolic expressions to get something from that level.
This logic is flawed because we can be in a situation like you encountered now. Where we believe that c + d >= 0, but if we look at c and d, we can see that this is actually impossible.
It can be solved by always doing Visit(Sym) and intersecting results from ALL available sources. It has a simple implementation, but requires additional performance testing.

Long story short, that's not your problem. Try to avoid it/rewrite the test.

manas updated this revision to Diff 353825.Jun 22 2021, 4:57 PM

Added updated logic for reasoning using number of overflows. Also, changed a couple of tests which were leading to unwanted constriants being propagated further.

manas marked 7 inline comments as done.Jun 22 2021, 4:59 PM
manas added a comment.Jun 22 2021, 5:05 PM

Regarding the tweakings in constant-folding.c, I have refrained from using cases which were resulting in UNKNOWN assertions as they were the primary reason for constraints being propagated.

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1073–1074

We should have these specific functions for other BO as well. Because they will lead us to reason about when Operand1 binop Operand2 can overflow or not. I was thinking in the direction of having a simpler class which works for this.

manas marked an inline comment as not done.Jun 22 2021, 6:29 PM

Hey, it looks like we are finally converging on this one! Great job!

NOTE: I don't know if you noticed, but I want to point out that there are three failing tests:
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1073–1074

I searched through a codebase a bit and here are the couple of functions (from APInt.h) that can come in handy:

// Operations that return overflow indicators.
APInt sadd_ov(const APInt &RHS, bool &Overflow) const;
APInt uadd_ov(const APInt &RHS, bool &Overflow) const;
APInt ssub_ov(const APInt &RHS, bool &Overflow) const;
APInt usub_ov(const APInt &RHS, bool &Overflow) const;
APInt sdiv_ov(const APInt &RHS, bool &Overflow) const;
APInt smul_ov(const APInt &RHS, bool &Overflow) const;
APInt umul_ov(const APInt &RHS, bool &Overflow) const;
APInt sshl_ov(const APInt &Amt, bool &Overflow) const;
APInt ushl_ov(const APInt &Amt, bool &Overflow) const;

APSInt is derived from APInt, so we can totally use these.

clang/test/Analysis/constant-folding.c
299

They are already unsigned, we don't need a >= 0 and b >= 0

360

I don't see the cases where we overflow on both ends and the case where we overflow on one end, but Min > Max.

manas added a comment.EditedJun 23 2021, 1:53 AM

I am working on debugging those 3 cases.

Thanks @vsavchenko and everyone for helping! :)

clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1073–1074

That's great! I will incorporate those in the code instead of custom functions.

clang/test/Analysis/constant-folding.c
299

Right! I will remove those constraints.

360

My bad. I thought I added both overflows one. I will add them now.

manas updated this revision to Diff 354154.Jun 23 2021, 10:25 PM

Fix issues involving cases for unsigned type and add tests

manas added inline comments.Jun 23 2021, 10:34 PM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402

Using uadd_ov (and sadd_ov), we can get the added value as well as whether overflow occurred or not. A point is that these functions return APInt instead of APSInt.

But when I tried just using:

Min = LHS.From().uadd_ov(RHS.From(), HasMinOverflowed);
Max = LHS.To().uadd_ov(RHS.From(), HasMaxOverflowed);

instead of

Min = LHS.From() + RHS.From();
Max = LHS.To() + RHS.To();

just for the added value, then the following tests failed (these tests and all other tests pass when I use the latter method to get Min/Max):

Clang :: Analysis/PR3991.m
Clang :: Analysis/global-region-invalidation.c
Clang :: Analysis/malloc-overflow2.c
Clang :: Analysis/out-of-bounds-new.cpp
Clang :: Analysis/taint-generic.c

I am working on fixing this part.

manas marked 3 inline comments as done.Jun 23 2021, 11:03 PM
vsavchenko added inline comments.Jun 24 2021, 1:49 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402

You can easily construct APSInt from APInt using APSInt ::APSInt(APInt I, bool isUnsigned) constructor.

manas marked an inline comment as not done.Jun 24 2021, 1:52 AM
manas added inline comments.
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402

Okay. I will try with using uadd_ov only then. And check whether those tests pass or not.

vsavchenko added inline comments.Jun 24 2021, 2:13 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402

Hmm, why only uadd_ov? What about those tests? How do they fail? Try to look at the reasons and not brute-force by trying different solutions blindly.
Those tests are your friends, it's much better to get failures right now then getting them later when you land the patch.

manas added inline comments.Jun 24 2021, 2:26 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402

I meant related functions as well.

Most of them were failing due to Error evaluating statements. Although, I have rebased my tree so I will re-check again.

vsavchenko added inline comments.Jun 24 2021, 2:44 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402

Error evaluating statements just tells you when we crashed (for the context and fast insight).
Below this message is usually the real reason (and the last time I checked, it was assertion From > To failure).

manas added inline comments.Jun 24 2021, 3:09 AM
clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp
1402

My bad. I took a look and found, it was signed mismatch!

manas updated this revision to Diff 354424.Jun 24 2021, 7:54 PM

Fix issues involving usage of uadd_ov and family of functions

manas marked 3 inline comments as done.Jun 24 2021, 7:55 PM
vsavchenko added inline comments.Jun 24 2021, 11:48 PM
clang/test/Analysis/constant-folding.c
294–296

We need a test with ranges and unsigned overflow

360

This comment is marked as "Done", but I still don't these cases.

manas updated this revision to Diff 354441.Jun 24 2021, 11:58 PM

Add tests for overflows on both ends

manas added inline comments.Jun 24 2021, 11:59 PM
clang/test/Analysis/constant-folding.c
360

Amended. From line 260 and later.

manas updated this revision to Diff 355825.Jul 1 2021, 3:57 AM

Reformat comments

manas updated this revision to Diff 356250.Jul 2 2021, 12:43 PM

Replace literal-value 0 with APSInt Zero

manas marked 2 inline comments as done.Jul 20 2021, 3:17 PM

Here is the proof of correctness of the algorithm using Z3: https://gist.github.com/weirdsmiley/ad6a9dbf3370e96d29f9e90068931d25

Here is the proof of correctness of the algorithm using Z3: https://gist.github.com/weirdsmiley/ad6a9dbf3370e96d29f9e90068931d25

There is a couple of fundamental problems there that you need to resolve. I left my comments there.

clang/test/Analysis/constant-folding.c
290

This looks like an overflow on one end (we couldn't have them on both ends with addition and unsigned integers) and wrapping around surpassing the beginning of the range.

manas updated this revision to Diff 363734.Aug 3 2021, 7:39 AM

Fix test comments

vsavchenko added inline comments.Aug 3 2021, 7:40 AM
clang/utils/analyzer/Dockerfile
15–22 ↗(On Diff #363734)

Changes to clang/utils/analyzer/* don't belong to this patch

manas added inline comments.Aug 3 2021, 7:42 AM
clang/utils/analyzer/Dockerfile
15–22 ↗(On Diff #363734)

I messed up while arc updating. :(

manas updated this revision to Diff 363742.Aug 3 2021, 7:49 AM

Fix unrelated commits mess up!