This is an archive of the discontinued LLVM Phabricator instance.

[StaticAnalyzer] Fix ProgramState for static variables that are not written
AbandonedPublic

Authored by danielmarjamaki on Sep 15 2017, 2:15 AM.

Details

Summary

I saw a false positive where the analyzer made wrong conclusions about a static variable.

Static variables that are not written have known values (initialized values).

This is the (simplified) code that motivated me to create this patch:

static char *allv[] = {
	"rpcgen", "-s", "udp", "-s", "tcp",

};
static int allc = sizeof(allv) / sizeof(allv[0]);

static void f(void) {
	int i;

	for (i = 1; i < allc; i++) {
		const char *p = allv[i];  // <- line 28
		i++;
	}
}

Clang output:

array-fp3.c:28:19: warning: Access out-of-bound array element (buffer overflow)
                const char *p = allv[i];
                                ^~~~~~~

I added testcases that shows this patch solves both false positives and false negatives

Diff Detail

Repository
rL LLVM

Event Timeline

Minor cleanups. Changed names. Updated comments.

xazax.hun edited edge metadata.Sep 15 2017, 3:16 AM

Out of curiosity, does the false positive disappear after making the static variables const?

Out of curiosity, does the false positive disappear after making the static variables const?

Yes that fixes the false positive

I have some comments and questions but maybe you do not want to address those until Devin, NoQ, or Anna approved the general directions.

lib/StaticAnalyzer/Core/ExprEngine.cpp
107

Usually, we do not like bug recursions since it might eat up the stack.
Also, do you consider the case when the variable is passed by reference to a function in another translation unit?

169

Does this work for non-integer typed e.g. structs?

NoQ edited edge metadata.Sep 19 2017, 11:07 AM

The overall idea makes sense to me. I'd like you to join the effort with Peter who during his work on loop widening came up with a matcher-based procedure for finding out if a variable is changed anywhere; it currently lives in LoopUnrolling.cpp and we need only once implementation of that.

Fixes according to review comments. Reuse ast matchers in LoopUnrolling.cpp. Avoid some recursion (however the isChanged() is still recursive but it is very small and simple).

danielmarjamaki marked an inline comment as done.Oct 6 2017, 12:20 AM
danielmarjamaki added inline comments.
lib/StaticAnalyzer/Core/ExprEngine.cpp
107

I rewrote the code to reuse the ast matchers in LoopUnroll.. and that is not recursive. I rewrote the getGlobalStaticVars (this code is longer and in my opinion less readable but it's not recursive).

169

Currently static struct objects are unaffected by these changes. There is no regression.

Reviews are taking too long time. I am not against getting reviews but I am against waiting for reviews for months, ideally people would only need to wait for at most a week to get a review. This is why I don't want to handle structs now -- I am not eager to prolong the review process for this patch.

danielmarjamaki marked an inline comment as done.Oct 6 2017, 12:21 AM
szepet added a comment.Oct 6 2017, 7:58 AM

Hello Daniel!

It is a great feature to add, thanks for working on this!
I have just small comments (rather questions) on the code.

lib/StaticAnalyzer/Core/ExprEngine.cpp
155

I think instead of this logic it would be better to use ConstStmtVisitor. In this case it does quite the same thing in a (maybe?) more structured manner. What do you think?

168

Is it possible that a type is an IntegerType and a PointerType at the same time? If these are excluding cases then the check for !isPointer could be removed.

dcoughlin requested changes to this revision.Oct 9 2017, 6:17 PM

Apologies for the delay reviewing! As I noted inline, I'm pretty worried about the performance impact of this. Is it possible to do the analysis in a single traversal of the translation unit?

lib/StaticAnalyzer/Core/ExprEngine.cpp
123

Since you are calling getInitialStateForGlobalStaticVar() in getInitialState() for each static variable declaration and getInitialState() is called for each top-level function, you are doing an n^3 operation in the size of the translation unit, which is going to be very, very expensive for large translation units.

Have you considered doing the analysis for static variables that are never changed during call-graph construction? This should be a linear operation and doing it during call-graph construction would avoid an extra walk of the entire translation unit.

This revision now requires changes to proceed.Oct 9 2017, 6:17 PM

Apologies for the delay reviewing! As I noted inline, I'm pretty worried about the performance impact of this. Is it possible to do the analysis in a single traversal of the translation unit?

I agree. I first tried more like that but ran into problems. Don't remember the details. I will try again.. however as far as I see this will mean the LoopUnroller AST matchers can't be reused unless I change them.

lib/StaticAnalyzer/Core/ExprEngine.cpp
123

hmm... could you tell me where the call-graph construction is that I can tweak?

lib/StaticAnalyzer/Core/ExprEngine.cpp
123

I think I found it: clang/lib/Analysis/CallGraph.cpp

danielmarjamaki edited edge metadata.

Track modification of global static variables in CallGraph construction

lib/StaticAnalyzer/Core/ExprEngine.cpp
123

I now track variable modifications in call-graph construction instead.

155

As far as I see ConstStmtVisitor is also recursive. Imho let's have readable code instead..

NoQ added a comment.Oct 30 2017, 7:55 AM

however as far as I see this will mean the LoopUnroller AST matchers can't be reused unless I change them.

Hmm. What is the problem here? I'm expecting library linkage issues (i.e. libAnalysis is not linked to libASTMatchers directly), but i guess it'd better to have the whole thing in libAnalysis then (Peter, WDYT?). Or maybe we could move the code around a bit, so that we're still in libStaticAnalyzerCore but at roughly the same moment of time.

Because you're duplicating the solution for exact same problem, using different technology, and this problem is not trivial (eg. you still need to consider reference-type declstmts and initializer lists which are not taken into account in your solution yet) so i'd dislike the idea of duplicating it.

lib/Analysis/CallGraph.cpp
104 ↗(On Diff #118947)

A hint, even if we don't use visitors: the whole point of having a visitor is about not needing to make a pattern-matching (switch or sequence of dyn_casts) by statement kind yourself, like you do in this function. You already have VisitUnaryOperator, VisitBinaryOperator, VisitCallExpr, etc., so you can add the code there.

danielmarjamaki abandoned this revision.Jan 15 2018, 12:30 AM

I will not continue working on this. Feel free to take over the patch or write a new patch.