This is an archive of the discontinued LLVM Phabricator instance.

[StaticAnalyzer] Try to calculate arithmetic result when operand has a range of possible values
AbandonedPublic

Authored by danielmarjamaki on Aug 8 2017, 8:12 AM.

Details

Summary

This is a fix for https://bugs.llvm.org/show_bug.cgi?id=32911

In the code below the division result should be a value between 5 and 25.

if (a >= 10 && a <= 50) {
  int b = a / 2;
}

This patch will calculate results for additions, subtractions and divisions.

I intentionally do not try to handle all possible cases that can be handled. I want to know if my approach is ok.

Diff Detail

Repository
rL LLVM

Event Timeline

danielmarjamaki created this revision.Aug 8 2017, 8:12 AM
danielmarjamaki edited the summary of this revision. (Show Details)Aug 8 2017, 8:16 AM
xazax.hun edited edge metadata.Aug 8 2017, 8:38 AM

Can't you reuse somehow some machinery already available to evaluate the arithmetic operators? Those should already handle most of your TODOs and overflows.

A minor code cleanup. No functional change.

Can't you reuse somehow some machinery already available to evaluate the arithmetic operators? Those should already handle most of your TODOs and overflows.

Sounds good.. I have not seen that machinery.. I will look around.

To me it seems it would be nice if this machinery was builtin in APSInt so I could calculate (x+y) even if x and y did not have the same signedness and that the result would be unsigned.

Refactoring, use BasicValueFactory::evalAPSInt

Should evalAPSInt() have machinery to do standard sign/type promotions? I suggest that I add one more argument bool promote = false, do you think that sounds good?

minor code cleanup

xazax.hun added a subscriber: NoQ.Oct 19 2017, 4:47 AM

I think this change is very useful but it is also important to get these changes right.
I think one of the main reason you did not get review comments yet is that it is not easy to verify that these changes are sound.

In general, there are false positives in the analyzer due to limits in the constraint manager (or missing parts in modeling the language). But in general, we try to avoid having false positives due to unsound assumptions (apart from some cases like assuming const methods will not change the fields of a class).

While the change you introduced is indeed very useful the soundness probably depends on the details of how promotions, conversions, and other corner cases are handled.
In order to introduce a change like this, we need to have those cases covered to ensure that we have the soundness we want and this needs to be verified with test cases.
Also once the solution is sound it would be great to measure the performance to ensure that we did not regress too much.

I understand that you do not want to work on something that might not get accepted but also with the available information it might be hard to decide whether this is a good approach to the problem or not.
But of course, I am just guessing here, @dcoughlin, @zaks.anna, @NoQ might have a different opinion.

A bit more technical comment: did you consider using SValBuilder's evalBinOpNN? I believe it already handles at least some of the conversions you did not cover here.

I have updated the patch so it uses evalBinOpNN. This seems to work properly.

I have a number of TODOs in the test cases that should be fixed. Truncations are not handled properly.

Here is a short example code:

void f(unsigned char X) {
  if (X >= 10 && X <= 50) {
    unsigned char Y = X + 0x100; // truncation
    clang_analyzer_eval(Y >= 10 && Y <= 50); // expected-warning{{FALSE}}
  }
}

The expected-warning should be TRUE but currently FALSE is written.

When the "Y >= 10" condition is evaluated the ProgramState is:

Store (direct and default bindings), 0x222ab0fe5f8 :
 (Y,0,direct) : (unsigned char) ((reg_$0<unsigned char X>) + 256)

Expressions:
 (0x222a96d6050,0x222ab0eb930) X + 256 : (unsigned char) ((reg_$0<unsigned char X>) + 256)
 (0x222a96d6050,0x222ab0eb960) clang_analyzer_eval : &code{clang_analyzer_eval}
 (0x222a96d6050,0x222ab0eb988) Y : &Y
 (0x222a96d6050,0x222ab0eb9d8) Y : (unsigned char) ((reg_$0<unsigned char X>) + 256)
 (0x222a96d6050,0x222ab0eb9f0) Y : (unsigned char) ((reg_$0<unsigned char X>) + 256)
 (0x222a96d6050,0x222ab0eba08) Y >= 10 : ((unsigned char) ((reg_$0<unsigned char X>) + 256)) >= 10
 (0x222a96d6050,0x222ab0ebb28) clang_analyzer_eval : &code{clang_analyzer_eval}
Ranges of symbol values:
 reg_$0<unsigned char X> : { [10, 50] }
 (reg_$0<unsigned char X>) + 256 : { [10, 50] }

It seems to me that the symbol initialization does not handle the range properly. Imho there is nothing wrong with the calculation. What you think about adding a range like below?

(unsigned char) ((reg_$0<unsigned char X>) + 256) : { [10, 50] }
danielmarjamaki abandoned this revision.Jan 15 2018, 12:32 AM

I will not continue working on this. Feel free to take over the patch or write a new patch.