This is an archive of the discontinued LLVM Phabricator instance.

New static analyzer checker for loss of sign/precision
ClosedPublic

Authored by danielmarjamaki on Sep 24 2015, 1:42 AM.

Details

Summary

This is a new static analyzer checker that warns when there is loss of sign and loss of precision.

It is similar in spirit to Wsign-compare/Wsign-conversion etc. But this checker uses proper analysis so the output is much more meaningful.

It has been tested on debian packages.

loss of precision results:
https://drive.google.com/file/d/0BykPmWrCOxt2WFV3NVhJdE94QUE/view
2195 projects, 1026 warnings

loss of sign results:
https://drive.google.com/file/d/0BykPmWrCOxt2cUpSQVlmUVJmR0k/view
2195 projects, 3613 warnigs

It seems to me that this checker shows that Clang does not always properly track values. It seems to think that result of ! can be negative for instance.

Diff Detail

Repository
rL LLVM

Event Timeline

danielmarjamaki retitled this revision from to New static analyzer checker for loss of sign/precision.
danielmarjamaki updated this object.
danielmarjamaki added a reviewer: zaks.anna.
danielmarjamaki updated this object.
danielmarjamaki added a subscriber: cfe-commits.
mgehre added a subscriber: mgehre.Sep 28 2015, 1:40 PM
zaks.anna edited edge metadata.Oct 1 2015, 5:20 PM

This checker produces a lot of warnings! Have you analyzed how many are false positives? Have you tried reporting these warnings?

It's hard to make use of the results you posted from the debian packages. For most of them, I cannot tell if they are valid reports or false positives. For example, why this one is a valid warning:

ftp://ftp.sunet.se/pub/Linux/distributions/Debian/debian/pool/main/a/autotrace/autotrace_0.31.1.orig.tar.gz
output-emf.c:164:9: warning: Loss of precision

outch = (UI8) (data & 0x0FF);

It might be more useful if you could print the paths on which the errors occurred (this could be done for text output with -analyzer-output=text). Of cause, that assumes that the important information on why the issue occurs is highlighted on the path. (I am not sure if you'll have that without adding a BugReporterVisitor.)

It seems to me that this checker shows that Clang does not always properly track values. It seems to
think that result of ! can be negative for instance.

What do you mean? I do not see any tests with '!'.

test/Analysis/conversion.c
54 ↗(On Diff #35596)

This is not a good test because the analyzer unrolls loops only a very small number of times.

It might be more useful if you could print the paths on which the errors occurred (this could be done for text output with -analyzer-output=text)

Sounds good. Is it possible to use it with scan-build?

lib/StaticAnalyzer/Checkers/CMakeLists.txt
29 ↗(On Diff #35596)

hmm.. I will move this down 1 line

It might be more useful if you could print the paths on which the errors occurred (this could be done for text output with -analyzer-output=text)

Sounds good. Is it possible to use it with scan-build?

Sorry. I saw that this has been discussed recently on cfe-dev.

http://lists.llvm.org/pipermail/cfe-dev/2015-October/045292.html

danielmarjamaki updated this revision to Diff 40505.EditedNov 18 2015, 7:18 AM
danielmarjamaki edited edge metadata.

I have problems with the "default" handling of expressions.

I want to warn about loss of precision for such code:

unsigned int x = 256;
unsigned char c;
c = x;

The analyser tells me that the RHS value is 0 instead of 256 in the assignment "c = x". Here is the dump of the ProgramState:

Store (direct and default bindings), 0x0 :

Expressions:
(0x7b9bd30,0x7b458d8) c : &c
(0x7b9bd30,0x7b45940) x : 0 U8b
Ranges are empty.

This patch is a quick proof-of-concept where I track variable values myself using ProgramState. It gives me better results when I scan projects than when I try to use the "default" ProgramState.

I wonder if you think I should continue working on this proof-of-concept method. Or if you have some tip how I can see the untruncated rhs value in a assignment.

ping..

the latest patch has an alternative approach, where the checker track values of symbols.

REGISTER_MAP_WITH_PROGRAMSTATE(DeclVal, const ValueDecl *, SVal)

The reason I don't use normal ProgramState values is that these symbol values are truncated.

I wonder what you think about this alternative approach. if I finish this, will it have a chance to be accepted?

NoQ added a subscriber: NoQ.Nov 25 2015, 10:02 AM

I have problems with the "default" handling of expressions.

I want to warn about loss of precision for such code:

unsigned int x = 256;
unsigned char c;
c = x;

The analyser tells me that the RHS value is 0 instead of 256 in the assignment "c = x". Here is the dump of the ProgramState:

Store (direct and default bindings), 0x0 :

Expressions:
(0x7b9bd30,0x7b458d8) c : &c
(0x7b9bd30,0x7b45940) x : 0 U8b
Ranges are empty.

Hmm. I guess you need to sign up for the cast expression rather than for the binary operator expression. By the time you reach the assignment operator, the value before the cast is already removed from the environment.

Another diff.

This time I have stripped the checker to make it simpler to review, triage, etc.

It will now only warn about loss of precision in assignments when RHS is a variable.

For me, triaging these results are time consuming. It's not obvious at first glance why there is warning. I have tried to strip down the source files to smaller testcases to see where it "goes wrong" and then very often saw that the warning was a TP.

When scanning 692 projects with this checker I got 56 warnings. I've triaged 21 random warnings of these so far and saw 20 TP and 1 FP.

When I have triaged the results there is one kind of "possible FP" that I see, like this:

unsigned long x = largevalue;
bytes[0] = x;  // Possible FP here. Technically it's a TP but it's an loss of precision by design.
bytes[1] = x >> 8;
....

I am currently considering to allow this FP. Any opinions about allowing it?

When scanning 692 projects with this checker I got 56 warnings. I've triaged 21 random warnings of these so far and saw 20 TP and 1 FP.

When I have triaged the results there is one kind of "possible FP" that I see, like this:

unsigned long x = largevalue;
bytes[0] = x;  // Possible FP here. Technically it's a TP but it's an loss of precision by design.
bytes[1] = x >> 8;
....

I am currently considering to allow this FP. Any opinions about allowing it?

Is there mechanism by which the user can suppress false positives like this (for example, an explicit cast)?

danielmarjamaki marked 2 inline comments as done.Dec 7 2015, 1:01 AM

When scanning 692 projects with this checker I got 56 warnings. I've triaged 21 random warnings of these so far and saw 20 TP and 1 FP.

When I have triaged the results there is one kind of "possible FP" that I see, like this:

unsigned long x = largevalue;
bytes[0] = x;  // Possible FP here. Technically it's a TP but it's an loss of precision by design.
bytes[1] = x >> 8;
....

I am currently considering to allow this FP. Any opinions about allowing it?

Is there mechanism by which the user can suppress false positives like this (for example, an explicit cast)?

Yes it can be suppressed using for instance "x & 0xff" or "(unsigned char)x".

The checker could hide the warning if the previous or next statement is "somevar = x >> 8". I don't think that would cause any significant FN. However I still consider to write the "FP".

test/Analysis/conversion.c
23 ↗(On Diff #41858)

it was removed

danielmarjamaki marked an inline comment as done.Dec 10 2015, 8:30 AM

I have looked all warnings that I got. 1678 projects where scanned. In total I got 124 warnings.

I classified 91 warnings as TP. 14 as FP. and then there were 19 that I failed to triage. I for instance failed to triage code implemented in headers when I don't know what values function arguments will have.

My feeling for the 14 FP is that the value analysis fails and then there is not much my checker could do.

I don't see any particular use case that my checker should handle better.

For information, here are the packages I got warnings in:
ftp://ftp.se.debian.org/debian/pool/main/a/abcm2ps/abcm2ps_7.8.9.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/a/abcmidi/abcmidi_20151110.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/a/acm/acm_5.0.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/a/aewan/aewan_1.0.01.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/a/ap-utils/ap-utils_1.5.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/a/aspell/aspell_0.60.7~20110707.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/a/audiofile/audiofile_0.3.6.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/b/bash/bash_4.3.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/b/binutils-h8300-hms/binutils-h8300-hms_2.16.1.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/b/bison++/bison++_1.21.11.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/b/brltty/brltty_5.2~20141018.orig.tar.xz
ftp://ftp.se.debian.org/debian/pool/main/b/bvi/bvi_1.4.0.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/c/cmus/cmus_2.7.1.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/c/cone/cone_0.89.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/c/cscope/cscope_15.8b.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/d/dash/dash_0.5.8.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/d/directfb/directfb_1.4.3.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/e/elk/elk_3.99.8.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/e/erlang/erlang_18.1-dfsg.orig.tar.xz
ftp://ftp.se.debian.org/debian/pool/main/f/findutils/findutils_4.5.14.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/f/fish/fish_2.2.0.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/f/fltk1.1/fltk1.1_1.1.10.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/f/fltk1.3/fltk1.3_1.3.3.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/f/fontforge/fontforge_20120731.b.orig.tar.bz2
ftp://ftp.se.debian.org/debian/pool/main/f/fox1.6/fox1.6_1.6.50.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/f/freewnn/freewnn_1.1.1~a021+cvs20130302.orig.tar.xz
ftp://ftp.se.debian.org/debian/pool/main/f/frei0r/frei0r_1.4.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/f/ftnchek/ftnchek_3.3.1.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/g/gnupg/gnupg_1.4.19.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/g/gxemul/gxemul_0.4.7.2.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/h/haildb/haildb_2.3.2.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/h/hercules/hercules_3.11.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/i/ion/ion_3.2.1+dfsg.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/l/lame/lame_3.99.5+repack1.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/l/loop-aes-utils/loop-aes-utils_2.16.2.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/m/m17n-lib/m17n-lib_1.7.0.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/m/mbr/mbr_1.1.11.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/m/mcpp/mcpp_2.7.2.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/m/mtools/mtools_4.0.18.orig.tar.bz2
ftp://ftp.se.debian.org/debian/pool/main/m/musl/musl_1.1.9.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/n/nasm/nasm_2.11.06-1really2.11.05.orig.tar.xz
ftp://ftp.se.debian.org/debian/pool/main/n/netkit-ftp/netkit-ftp_0.17.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/n/nmap/nmap_7.00.orig.tar.bz2
ftp://ftp.se.debian.org/debian/pool/main/o/openct/openct_0.6.20.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/o/owl/owl_2.2.2.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/p/paperkey/paperkey_1.3.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/p/pnm2ppa/pnm2ppa_1.13.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/p/procps/procps_3.3.9.orig.tar.xz
ftp://ftp.se.debian.org/debian/pool/main/p/protobuf/protobuf_2.6.1.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/p/proxycheck/proxycheck_0.49a.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/p/putty/putty_0.66.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/r/rds-tools/rds-tools_1.4.1-OFED-1.4.2.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/r/ruby1.8/ruby1.8_1.8.7.358.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/s/sam2p/sam2p_0.49.2.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/s/sane-backends/sane-backends_1.0.26~git20151121.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/s/scheme9/scheme9_2015.11.19.orig.tar.gz
ftp://ftp.se.debian.org/debian/pool/main/s/sed/sed_4.2.2.orig.tar.bz2
ftp://ftp.se.debian.org/debian/pool/main/s/sg3-utils/sg3-utils_1.41.orig.tar.xz

This comment was removed by danielmarjamaki.

ping. I would like to have a review on this.

NoQ added a comment.EditedDec 17 2015, 3:18 AM

I've got a few minor code comments.

I really wish to have a look at false positives on which

the value analysis fails and then there is not much my checker could do

either in a form of FIXME tests, or as preprocessed code samples, because i'm currently digging the topic of improving cast modeling (either through producing SymbolCast more often, or by taking sub-symbol-types vs. symbolic-expression-types vs. ast-expression-types into account during assume() and evalBinOp() and various ExprEngine calls. Eg, when casting an int into a char and then back into an int, ensure that the resulting int value carries, at most, a char-range constraint. So i wonder if such fix would help.

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
78 ↗(On Diff #41858)

It's not quite "the value can be greater or equal", but in fact rather "the value is certainly greater or equal".
Same applies to canBeNegative().

85 ↗(On Diff #41858)

A slightly shorter way to express the same thing:

llvm::Optional<NonLoc> EVal = C.getSVal(E).getAs<NonLoc>();
if (!EVal)
  return false;
// later use (*EVal)
96 ↗(On Diff #41858)

You can use the overridden version of assume() that returns two states at once into a std::pair, in order to avoid double-work.

121 ↗(On Diff #41858)

Not sure, but shouldn't the common Expr::EvaluateAsInt() mechanism handle it all of it, including recursive traversal through operators?

142 ↗(On Diff #41858)

You don't need to call .getTypePtr(), because QualType already has a convenient overridden operator ->(), which gives you quick access to the underlying Type *.

158 ↗(On Diff #41858)

BasicValueFactory::getMaxValue(QualType) might be useful.

NoQ added a comment.Dec 17 2015, 5:34 AM

Hmm, just noticed the related work on casts in D12901, which seems to be directly related to my hand-waving above. It might accidentally be useful for reducing FPs of this checker as well.

Thanks a lot for those comments. I'll try your suggestions. I will try to upload some samples where I think the ProgramState is wrong.

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
78 ↗(On Diff #41858)

I disagree.

int A = 0;
if (X) {
     A = 1000;
}
U8 = A;  // <- Imho; A _can_ be 1000

Imho it's better to say that A _can_ be 1000 unless A is 1000 for all possible execution paths through the code.

a.sidorin added inline comments.
lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
41 ↗(On Diff #41858)

Parent should always exist for an implicit cast. May be it's better to assert here?

49 ↗(On Diff #41858)

It's not evident why do you omit other Assign operators here, like BO_SubAssign, BO_AddAssign and BO_DivAssign. As I see from your test, there are some problems with them. Could you add a comment?

74 ↗(On Diff #41858)

Source sub-expression of cast expression we're visiting cannot be null and it should have non-null type. I think you can use something like E->getType()->isSignedIntegerType() instead of this function or you can use some assertions.

a.sidorin added inline comments.Dec 17 2015, 9:26 AM
lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
44 ↗(On Diff #41858)

Note that InitExprs of DeclStmts are not binary operators. So you will not get a warning on initialization and this test:

void test(unsigned int p) {
  unsigned X = 1000;
  unsigned char uc = X; // expected-warning {{Loss of precision}}
}

will fail.

danielmarjamaki marked 4 inline comments as done.Dec 18 2015, 5:18 AM
danielmarjamaki added inline comments.
lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
41 ↗(On Diff #41858)

If I comment out lines 41 and 42 and run "make check-all" I get:

FAIL: Clang :: Analysis/misc-ps-region-store.cpp (668 of 24574)
******************** TEST 'Clang :: Analysis/misc-ps-region-store.cpp' FAILED ********************
....
#0 ...
#1 ...
...
#9 0x0000000002f1d062 llvm::isa_impl_cl<clang::BinaryOperator, clang::Stmt const*>::doit(clang::Stmt const*) /home/danielm/llvm/include/llvm/Support/Casting.h:96:0
44 ↗(On Diff #41858)

I have limited this patch hoping that it will be easier to triage and review. Right now I only warn if there is assignment and RHS is a DeclRefExpr and only for loss of precision. I will definitely look into initializations also later.

49 ↗(On Diff #41858)

DivAssign is obvious; if the rhs is too large there won't be loss of precision.
But I fail to think of additional problems to handle BO_AddAssign and BO_SubAssign also immediately.. I'll add it in next patch

158 ↗(On Diff #41858)

Imho this seems harder.

To start with the getMaxValue returns 127 for a signed 8-bit integer.

I don't want to warn here:

S8 = 0xff;  // <- no loss of precision, all bits are saved

So instead of 1 line I'd get 4+ lines:

SValBuilder &Bldr = C.getSValBuilder();
BasicValueFactory &BVF = Bldr.getBasicValueFactory();
QualType U = ResultType...  // I don't know how to convert ResultType
const llvm::APSInt &MaxVal1 = BVF.getMaxValue(U);

Maybe I am missing something. Let me know if I do.

danielmarjamaki marked an inline comment as done.

Refactorings thanks to review comments

danielmarjamaki marked 7 inline comments as done.Jan 20 2016, 7:27 AM
danielmarjamaki marked 2 inline comments as done.

fixed review comments, readded 'loss of sign' checking

danielmarjamaki marked 2 inline comments as done.Jan 26 2016, 4:11 AM

For information, I am testing this patch right now.. it will take a while 1-2 days.

For information, I am testing this patch right now.. it will take a while 1-2 days.

I have run my latest patch..

In 2215 projects it found 875 warnings.

For comparison the -Wconversion generates ~1.5 million warnings in these projects. So it's drastically less noisy.

I have triaged random warnings of those 875. I classified 11 warnings as false positive, 85 as true positive, 3 as dontknow (I fail to determine if there is or is not truncation).

Why is there such a large jump in the number of warnings reported in the last patch iteration?
It went from "1678 projects where scanned. In total I got 124 warnings" to "In 2215 projects it found 875 warnings." Did the number of warnings in the initial 1678 projects stay the same?

Is it possible to take a look at the nature of the false positives, as per NoQ's request above?

This checker would benefit greatly from explaining why the errors occur. For example, where the values of variables are being constrained. Other checkers use BugReporterVisitor to generate the rich diagnostic information. Dow you have plans on how to accomplish that for this checker?

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
11 ↗(On Diff #45972)

Please, make this more specific.
Would be useful to summarize which heuristics you used to bring the number of false positives down.

29 ↗(On Diff #45972)

nit: Please move the body out of the declaration.

74 ↗(On Diff #45972)

Please, capitalize the sentence.

84 ↗(On Diff #45972)

This function returns true if the value "is" greater or equal, not "can be" greater or equal. The latter would be "return StGE".

Also, it's slightly better to return the StGE state and use it to report the bug. This way, our assumption is explicitly recorded in the error state.

147 ↗(On Diff #45972)

Please, use capitalization and punctuation.

Why is there such a large jump in the number of warnings reported in the last patch iteration?
It went from "1678 projects where scanned. In total I got 124 warnings" to "In 2215 projects it found 875 warnings." Did the number of warnings in the initial 1678 projects stay the same?

No I improved the check.

My previous patch was limited: I have limited this patch hoping that it will be easier to triage and review. Right now I only warn if there is assignment and RHS is a DeclRefExpr and only for loss of precision.

Is it possible to take a look at the nature of the false positives, as per NoQ's request above?

I'll try when I get some time.

This checker would benefit greatly from explaining why the errors occur. For example, where the values of variables are being constrained. Other checkers use BugReporterVisitor to generate the rich diagnostic information. Dow you have plans on how to accomplish that for this checker?

Yes that sounds like a good idea.. I''ll try to look at that.

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
84 ↗(On Diff #45972)

NoQ made the same comment. I disagree.

int A = 0;
if (X) {
     A = 1000;
}
U8 = A;  // <- Imho; A _can_ be 1000

Imho it's better to say that A _can_ be 1000 unless A is 1000 for all possible execution paths through the code.

Do you still think "is" is better than "can be"?

zaks.anna added inline comments.Mar 22 2016, 11:05 AM
lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
84 ↗(On Diff #45972)

The Clang Static Analyzer performs path sensitive analysis of the program. (It does not merge the paths at the "U8 = A" statement!!!) You will only be changing the state along a single execution path of this program. Along that path, A will always be 1000.

When analyzing your example, the analyzer is going to separately analyze 2 paths:
1st path: A=0; X != 0; A =1000; U8 = A; Here U8 is definitely 1000.
2d path: A=0; X == 0; U8 = A;
Here U8 is definitely 0.

This video contains an intuitive explanation of symbolic execution technique we use: http://llvm.org/devmtg/2012-11/videos/Zaks-Rose-Checker24Hours.mp4

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
84 ↗(On Diff #45972)

I understand that and I still think that value of A "can be" 1000. Yes in that path the value "is" 1000.

But as far as I see, you and others disagree with me. And therefore I will change to "is".

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
84 ↗(On Diff #45972)

For your information in Cppcheck I say that a value is "possible" if some path(s) generates that value. And "always" when all paths generate that value.

Code example:

int f(int x) {
  int a = 1000;
  int b = 0;
  if (x == 500)
    a = 3;
  return a + b - x;
}

Debug output (cppcheck --debug-normal file.c):

##Value flow
Line 3
  1000 always 1000
Line 4
  0 always 0
Line 5
  x possible 500
  == possible 1
  500 always 500
Line 6
  3 always 3
Line 7
  a possible {1000,3}
  + possible {1000,3}
  b always 0
  x possible 500

For me personally it is confusing to say that A "is" 1000. That is different to how I normally think of it in Cppcheck.

NoQ added inline comments.Mar 23 2016, 4:29 AM
lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
84 ↗(On Diff #45972)

Consider the following examples:

// Example 1.
int a = rand();
use(a);
// Example 2.
int b;
scanf("%d", &b);
use(b);

Here a and b "can be" 1000 when passed into use(), however your function would return false, because there is no particular execution path found on which they are "certainly" 1000.

On the other hand, in the following example:

// Example 3.
int a = rand();
if (a == 1000) {
}
use(a);

at the use() of a your function would return false on the path that passes through one branch and true on another path, however the false return value would not indicate that a "cannot be" 1000 on this line; it simply indicates that a is not certainly 1000 on one (but not all) of the paths.

These examples show that the function returns "false" much more often than its name suggests, hence we propose a different terminology. The good name for the function would be "the value 'certainly is' [greater or equal] on this particular path, though probably not on every path".

Additionally, example 2 is of particular interest: it gives an example of a "tainted" symbol which is explicitly known to take all values that fit its integral type, depending on user (cf. "attacker") input. You might (some day, no rush, i guess) like to extend your checker to consider tainted values as truly "can-be-anything" and throw warnings even without finding a particular path. For such cases, your function would actually be something like "there exists a path on which it 'certainly can be', though maybe on other paths it's not as certain". There's already a basic support for such taint analysis in the analyzer.

In fact, example 1 is also a curious discussion point - even when there's no attacker to substitute your random number generator, symbols produced by rand() and such are also explicitly known to take all values between 0 and some kind of RAND_MAX, which is a sort of information that may be useful to the analyzer and can be described as a weaker kind of taint without much security implications, but still not something that can be represented as presence or lack of range constraints.

Here are some false positives I have marked... let me know if you want more...

I have marked this as a FP:

URL: ftp://ftp.se.debian.org/debian/pool/main/m/m17n-lib/m17n-lib_1.7.0.orig.tar.gz
File: m17n-lib-1.7.0/src/mtext.c
Line 1946

Code:

int mtext_set_char (MText *mt, int pos, int c) {
    ...
    switch (mt->format)
    {
        case MTEXT_FORMAT_US_ASCII:
            mt->data[pos_unit] = c;  // <- WARNING
            break;
      .....
    }
}

The precision of "c" can be too large for "mt->data[pos_unit]" if the format is not MTEXT_FORMAT_US_ASCII.

I believe it's a FP but it can probably be fixed by an assertion inside the "case".

I have marked this as a FP:

URL: ftp://ftp.se.debian.org/debian/pool/main/n/netkit-ftp/netkit-ftp_0.17.orig.tar.gz
File: netkit-ftp-0.17/ftp/ftp.c
Line 416

Code:

		while ((c = getc(cin)) != '\n') {
                ...
                if (c == EOF) {
                    ....
                    return 4;
                }

                ...
                *cp++ = c;  // <- WARNING
            }

I have marked this as a FP:

URL: ftp://ftp.se.debian.org/debian/pool/main/o/owl/owl_2.2.2.orig.tar.gz
File: owl-2.2.2.orig/keypress.c
Line: 185

Code:

if (isascii(j)) {
  kb2[0] = j;  // <- WARNING
  kb2[1] = 0;
  strcat(kb, kb2);    
}

If isascii() returns 1 then j is a 7-bit value. Then there is not loss of precision.

Minor fixes of review comments.

  • Improved comments about the checker in the source code.
  • Improved capitalization and punctuation in error messages.
  • Renamed "canBe..." functions and changed their return type to ProgramStateRef to get better diagnostics.
danielmarjamaki marked 3 inline comments as done.Mar 23 2016, 5:49 AM
danielmarjamaki added inline comments.
lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
12 ↗(On Diff #51402)

Thanks! I have tried to do that.

30 ↗(On Diff #51402)

ok

85 ↗(On Diff #51402)

I renamed and changed these functions. Hope we all like it better now. The name is now "greaterEqualState" and it returns the state when the value is greater or equal. If there is no such state it returns nullptr.

As far as I see the diagnostics are showing the proper path now..

148 ↗(On Diff #51402)

sorry about that, I have fixed it

Could you add the reduced false positives to the tests file?

As far as I see the diagnostics are showing the proper path now..

What do you mean? Does this refer to supplying more information the the path about why the error occurs?

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
12 ↗(On Diff #51402)

Can you describe what it does without referencing Wconversion? A reader might not know what that warning does.

85 ↗(On Diff #51402)

Cppcheck is probably performing control-flow sensitive analysis, which is completely different than the algorithm that is used by the Clang Static Analyzer, which performs path-sensitive analysis. The meaning of may and must are different. It is very important to understand the fundamentals behind how the analyzer works. Please, watch the video and let me know if you are interested in more information on symbolic execution.

When we talk about possible values for a variable(or a symbol) along one path in the static analyzer we can have both may and must:

  • may be greater than: StGE != nullptr
  • is greater than: StGE && !StLT

The whole point behind "assumeDual" is to allow us to differentiate between them. Your function specifically only cares about the "must" case. It would be a bug if it checked that the state is a "may" state.

danielmarjamaki marked 2 inline comments as done.

Minor fixes of review comments. Renamed functions to "isNegative" and "isGreaterEqual" and return bool. Updated comment. Added testcases for false positives I have seen.

Could you add the reduced false positives to the tests file?

As far as I see the diagnostics are showing the proper path now..

What do you mean? Does this refer to supplying more information the the path about why the error occurs?

Sorry.. I was too quick. I changed back to "bool".

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
13 ↗(On Diff #52029)

ok, hope this is good.

85 ↗(On Diff #52029)

I renamed to "isNegative" and "isGreaterEqual"

Make sure patch can be applied in latest trunk.

Ping.

NoQ added a comment.Aug 10 2016, 9:53 AM

Wow, i've completely forgot about this one. Sorry about that... I think this checker is good to have in the current shape, and probably incrementally developed. It'd probably move out of alpha whenever it's tweaked to somehow make sure it finds enough real bugs among intentional integer truncations, which would most likely require some nontrivial heuristics.

Added a few very minor comments, once they're fixed i'd consider committing(?)

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
10 ↗(On Diff #67470)

not => no

21 ↗(On Diff #67470)

has => have

66 ↗(On Diff #67470)

I think you can assert that the parent exists here.

87 ↗(On Diff #67470)

You are generating a fatal error node, which stops the analysis on the path on which the bug is found. I don't think it's desired, because the program isn't known to terminate upon loss of sign/precision, and it suppresses other warnings by other checkers. I think you need to use generateNonFatalErrorNode() here. But then you need to make sure you generate this node exactly once per callback and re-use for all reports in case there are multiple reports, otherwise the execution path behind this node may be duplicated.

fixed review comments

danielmarjamaki marked 3 inline comments as done.Aug 12 2016, 3:43 AM
danielmarjamaki added inline comments.
lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
67 ↗(On Diff #67817)

I get assertion failure then when running this test: Analysis/misc-ps-region-store.cpp

Command that fails:

/home/danielm/llvm/build/./bin/clang -cc1 -internal-isystem /home/danielm/llvm/build/bin/../lib/clang/4.0.0/include -nostdsysteminc -triple i386-apple-darwin9 -analyze -analyzer-checker=core,alpha.core,debug.ExprInspection -analyzer-store=region -verify -fblocks -analyzer-opt-analyze-nested-blocks /home/danielm/llvm/tools/clang/test/Analysis/misc-ps-region-store.cpp -fexceptions -fcxx-exceptions -Wno-tautological-undefined-compare

NoQ accepted this revision.Aug 17 2016, 8:35 AM
NoQ added a reviewer: NoQ.

Looks good!

lib/StaticAnalyzer/Checkers/ConversionChecker.cpp
67 ↗(On Diff #67817)

Aha! Had a look: the parent is a CXXCtorInitializer, which is outside this function's body, and hence not covered with the current StackFrameContext's ParentMap. So you're losing positives in constructor initializers, and fixing it would take some code.

This revision is now accepted and ready to land.Aug 17 2016, 8:35 AM
This revision was automatically updated to reflect the committed changes.