This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
-
CallEvent.cpp
-
test/Analysis/
-
Analysis/
-
call-invalidation.cpp
-
cxx-uninitialized-object.cpp
-
malloc.c
-
taint-generic.c
-
taint-tester.c

Differential D57230

[analyzer] Toning down invalidation a bit
ClosedPublic

Authored by xazax.hun on Jan 25 2019, 4:42 AM.

Download Raw Diff

Details

Reviewers

george.karpenkov
Szelethus
NoQ

Commits

rG3d90e7e8db2c: Revert "[analyzer] Toning down invalidation a bit".
rC357620: Revert "[analyzer] Toning down invalidation a bit".
rL357620: Revert "[analyzer] Toning down invalidation a bit".
rGf41e3d087344: [analyzer] Toning down invalidation a bit
rC352473: [analyzer] Toning down invalidation a bit
rL352473: [analyzer] Toning down invalidation a bit

Summary

This is a patch for the following discussion on the mailing list: http://clang-developers.42468.n3.nabble.com/analyzer-Toning-down-invalidations-td4058816.html

The consensus is, while this approach might temporarily increase the false positive rate a bit systems of mutually-canceling bugs are worth untangling.
Most of the extra results are not false positive due to the less invalidation but other reasons, so we could focus on those problems instead of them being hidden.

Another consideration is that we actually introduce a new class of false positives, when a function is doing an offset trickery with fields. I think such functions should have a special annotation to suppress such false positives.

There are a few questions regarding that:

What should be the spelling of such an annotation?
How to handle indirect calls? Even if we were in an ideal world where all the callees are annotated, we might not know who the actual callee is (function pointers, virtual calls etc). So what should we do? Less or more invalidation for all indirect calls or have a separate mechanism to let the user define at the call site how to handle a specific call?

Some other ideas from Artem on the mailing list:

Relaxing the C++ container inlining heuristic, i.e. replacing it with visitor-based suppressions, so that to still enjoy the benefits of inlining. This will also likely to result in less invalidation, but it could have severe effect on how and where we spend our budget (given the complexity of STL implementations for performance tuning).
It shouldn't be all that hard to model extents of bindings within RegionStore, so that bindings to sub-structures didn't overwrite bindings to super-structures simply because they have the same base region and the same offset. The only problem here is to model extents of *integers* because we don't represent casts as part of SymbolRefs. All other sorts of SVals have well-defined extents (including, say, lazy compound values).

Diff Detail

Repository: rL LLVM

Event Timeline

xazax.hun created this revision.Jan 25 2019, 4:42 AM

Herald added subscribers: gamesh411, dkrupp, donat.nagy and 6 others. · View Herald TranscriptJan 25 2019, 4:42 AM

xazax.hun edited the summary of this revision. (Show Details)Jan 25 2019, 4:44 AM

Let's also have a link to your cfe-dev mail in this patch: http://lists.llvm.org/pipermail/cfe-dev/2019-January/060968.html

Overall, I like this quite a bit, as I personally experienced the consequence of this invalidation technique while developing UninitializedObjectChecker. I'd be interested to see how many more reports will it emit with this patch, I have a suspicion that it'll be very significant.

Let's wait for what @NoQ thinks of this patch.

What should be the spelling of such an annotation?

How about these: uses_offsetof, may_use_offsetof?

How to handle indirect calls? Even if we were in an ideal world where all the callees are annotated, we might not know who the actual callee is (function pointers, virtual calls etc). So what should we do? Less or more invalidation for all indirect calls or have a separate mechanism to let the user define at the call site how to handle a specific call?

Hmm, I suspect that such functions are few and far in between, so maybe the inconvenience of annotating calls to a function through a function pointer that may use offsetof can be justified.
For virtual methods, I guess we can't expect the user to always have the ability of adding the annotation to first base where the virtual method is declared, and not even CTU can ensure that we can scan all methods that implement it. Maybe for these very exceptionally rare cases, annotating the actual calls to the functions would also be justified.

Sadly, I can't say anything meaningful about Artem's ideas on top of my head. :)

test/Analysis/cxx-uninitialized-object.cpp
371 ↗	(On Diff #183516)	I bet that the current invalidation technique crippled much of this checker's capabilities, so I'm happy to see it change. Hooray!

This revision is now accepted and ready to land.Jan 25 2019, 5:53 AM

Thanks! A surprise, to be sure :) I'll try to test this on my set of projects as well :)

Most of the extra results are not false positive due to the less invalidation but other reasons, so we could focus on those problems instead of them being hidden.

Could you share reproducible examples for these, probably in the form of FIXME tests? Given that they are "regressions", they are easy to creduce down to a small repro by using the test "there is still a change in behavior on this file".

lib/StaticAnalyzer/Core/CallEvent.cpp
320–321 ↗	(On Diff #183516)	I suspect that the trait for non-base `MR` would never be read. The only place where this trait is accessed is in RegionStore.cpp where it asks whether the trait is applied to a cluster base, which is always a base region.
test/Analysis/call-invalidation.cpp
146 ↗	(On Diff #183516)	Let's leave at least one positive check around, eg. demonstrate that invalidation does happen for `s1.y` here.

In D57230#1372275, @NoQ wrote:

Could you share reproducible examples for these, probably in the form of FIXME tests? Given that they are "regressions", they are easy to creduce down to a small repro by using the test "there is still a change in behavior on this file".

I think the most common cause of false positives is infeasible paths. Do you have success reducing false positives using creduce? My problem usually is that we cannot tell if a reduction rendered a false positive into a true positive.

lib/StaticAnalyzer/Core/CallEvent.cpp
320–321 ↗	(On Diff #183516)	I see some test failures when I always used the base region. I suspect the reason is that `InvalidateRegionsWorker::AddToWorkList` will add the region itself instead of the base region when `TK_DoNotInvalidateSuperRegion` is set. So if we only set the `TK_PreserveContents` trait for the base region `InvalidateRegionsWorker::VisitCluster` will not see the `TK_PreserveContents` trait. In fact, the naming of regions in the those functions are very confusing. Even though the formal paramter is called `baseR`, my suspicion is that, we might visit non-base regions (due to the `TK_DoNotInvalidateSuperRegion` trait).

Added some tests

In D57230#1372488, @xazax.hun wrote:

In D57230#1372275, @NoQ wrote:

Do you have success reducing false positives using creduce? My problem usually is that we cannot tell if a reduction rendered a false positive into a true positive.

False positives - no. Improvements and regressions - totally! Just run two different clangs in the creduce test and check that there's a difference in results.

In D57230#1372523, @NoQ wrote:

In D57230#1372488, @xazax.hun wrote:

In D57230#1372275, @NoQ wrote:

Do you have success reducing false positives using creduce? My problem usually is that we cannot tell if a reduction rendered a false positive into a true positive.

False positives - no. Improvements and regressions - totally! Just run two different clangs in the creduce test and check that there's a difference in results.

Oh, I see. Great idea, I never did this. Will look into it.

I tried to creduce one file where the result differed and this is the result:

typedef struct {
  int a;
  int b
} c;
d;
e(c *f) {
  d < f->a;
  c g;
  h(&g.b);
  e(&g);
}

I think this the core idea is quite straightforward but this example is a bit convoluted due to the recursion. I do not see any value of adding this to the regression tests as this case is already covered there. Do you think I should try to reduce additional files?

I'm in favor of this change, I never understood how invalidating a field invalidates entire structure.

In D57230#1373721, @xazax.hun wrote:

Do you think I should try to reduce additional files?

Aha, ok, it reduced an interesting positive into a non-interesting positive. So i guess my method only works when you're catching changes that are more unexpected than this one :) Ok, nvm then, thank you for trying this out! I didn't have time to evaluate this change yet, but that definitely shouldn't block you from committing.

In D57230#1373721, @xazax.hun wrote:

I think this the core idea is quite straightforward but this example is a bit convoluted due to the recursion.

Hint: You can almost always "unroll" recursions or loops in reduced tests by looking at the Exploded Graph and figuring out how many times were they actually executed before the bug was found and then copy-paste-ing the code that many times.

lib/StaticAnalyzer/Core/CallEvent.cpp
320–321 ↗	(On Diff #183516)	Aha, yeah, you're right! The word "Cluster" is also confusing because it is usually used for `ClusterBindings` :)

Closed by commit rL352473: [analyzer] Toning down invalidation a bit (authored by xazax). · Explain WhyJan 29 2019, 2:28 AM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: llvm-commits. · View Herald TranscriptJan 29 2019, 2:28 AM

Thanks for all the reviews. Do you have any preference about the spelling of the annotation mentioned in the description?

There were two ideas so far: uses_offsetof, may_use_offsetof

While I like those, I wonder if it is a good idea to have offsetof in the name. One might use other methods, e.g. cast the address of the first field to a pointer to struct to access other members.

Hmm. writes_to_superobject?

Herald added a project: Restricted Project. · View Herald TranscriptFeb 5 2019, 11:47 AM

There seem to be a few regressions - weird memory leaks of inner objects in C++ destructors. Trying to investigate/reproduce.

In D57230#1387834, @NoQ wrote:

There seem to be a few regressions - weird memory leaks of inner objects in C++ destructors. Trying to investigate/reproduce.

Oh, that is unfortunate. Feel free to share a repro as soon as you have one and I will also try to look into it. The last time I saw strange leaks were due to the fact we did not invalidate symbols after unmodelled casts. But if these warnings were introduced by this change it is probably something else.

That's the one:

typedef __typeof(sizeof(int)) size_t;
void *malloc(size_t);

void escape(int **);

struct S {
  int *ptr;
};

void foo() {
  struct S s1;
  s1.ptr = malloc(sizeof(int));
  escape(&s1.ptr);
}

After the patch the allocated symbol no longer escapes. It didn't end up having much to do with destructors. I'll also think about it a bit more.

I think I might have a theory, but I would like to discuss it as I am not familiar with the internals bindings.

My theory is the following: when we store the bindings, we store them in a map where the key is a base region.
So when we try to look the bindings up with a non-base region, we will not get any bindings.

So in our current case, we end up having a non-base region in the worklist of InvalidateRegionsWorker.
ClusterAnalysis::RunWorkList will look up the cluster for the non-base region.
Without a cluster found we will not visit the bindings. With not visiting the bindings, we will not invalidate the symbols.
With no symbols to invalidate, the checkers will not get notified.

I think the whole ClusterAnalysis is flawed at this point. Most of the code expects to only see base regions, but some code paths might end up adding non-base regions.

So the question is, what should be the proper way to handle the TK_DoNotInvalidateSuperRegion trait?
Maybe we should always look up the bindings using the base region. But if we do, should we actually visit all of the bindings?

I did not have time yet to play with the possible solutions and will come back to this problem soon, just wanted to write down what I got so far.

Experimental patch is up in https://reviews.llvm.org/D58121
Unfortunately, it is not perfect yet.

Herald added a subscriber: jdoerfert. · View Herald TranscriptFeb 12 2019, 7:26 AM

Hmm, here's another one:

struct ListInfo {
  struct ListInfo *next;
};

struct X {
  struct ListInfo li;
  int i;
};

void list_add(struct ListInfo *list, struct ListInfo *item);

void foo(struct ListInfo *list) {
  struct X *x = malloc(sizeof(struct X));
  list_add(list, &x->li); // will free 'x'.
}

People are C-style-inheriting from a list item base, and are then happy to release the memory through a pointer to a field. Now we're reporting a memory leak on such code.

It looks as if we should have somehow disabled invalidation but not pointer escape for the base region.

Herald added a subscriber: Charusso. · View Herald TranscriptMar 18 2019, 5:32 PM

NoQ mentioned this in D58121: [analyzer][WIP] Attempt to fix traversing bindings of non-base regions in ClusterAnalysis.Mar 20 2019, 8:38 PM

Revision Contents

Path

Size

cfe/

trunk/

lib/

StaticAnalyzer/

Core/

CallEvent.cpp

22 lines

test/

Analysis/

call-invalidation.cpp

5 lines

cxx-uninitialized-object.cpp

5 lines

malloc.c

4 lines

taint-generic.c

1 line

taint-tester.c

2 lines

Diff 184047

cfe/trunk/lib/StaticAnalyzer/Core/CallEvent.cpp

Show First 20 Lines • Show All 297 Lines • ▼ Show 20 Lines	ProgramStateRef CallEvent::invalidateRegions(unsigned BlockCount,
// Indexes of arguments whose values will be preserved by the call.		// Indexes of arguments whose values will be preserved by the call.
llvm::SmallSet<unsigned, 4> PreserveArgs;		llvm::SmallSet<unsigned, 4> PreserveArgs;
if (!argumentsMayEscape())		if (!argumentsMayEscape())
findPtrToConstParams(PreserveArgs, *this);		findPtrToConstParams(PreserveArgs, *this);

for (unsigned Idx = 0, Count = getNumArgs(); Idx != Count; ++Idx) {		for (unsigned Idx = 0, Count = getNumArgs(); Idx != Count; ++Idx) {
// Mark this region for invalidation. We batch invalidate regions		// Mark this region for invalidation. We batch invalidate regions
// below for efficiency.		// below for efficiency.
		if (const MemRegion *MR = getArgSVal(Idx).getAsRegion()) {
		bool UseBaseRegion = true;
		if (const auto *FR = MR->getAs<FieldRegion>()) {
		if (const auto *TVR = FR->getSuperRegion()->getAs<TypedValueRegion>()) {
		if (!TVR->getValueType()->isUnionType()) {
		ETraits.setTrait(MR, RegionAndSymbolInvalidationTraits::
		TK_DoNotInvalidateSuperRegion);
		UseBaseRegion = false;
		}
		}
		}
		// todo: factor this out + handle the lower level const pointers.
if (PreserveArgs.count(Idx))		if (PreserveArgs.count(Idx))
if (const MemRegion *MR = getArgSVal(Idx).getAsRegion())		ETraits.setTrait(
ETraits.setTrait(MR->getBaseRegion(),		UseBaseRegion ? MR->getBaseRegion() : MR,
RegionAndSymbolInvalidationTraits::TK_PreserveContents);		RegionAndSymbolInvalidationTraits::TK_PreserveContents);
// TODO: Factor this out + handle the lower level const pointers.		}

ValuesToInvalidate.push_back(getArgSVal(Idx));		ValuesToInvalidate.push_back(getArgSVal(Idx));

// If a function accepts an object by argument (which would of course be a		// If a function accepts an object by argument (which would of course be a
// temporary that isn't lifetime-extended), invalidate the object itself,		// temporary that isn't lifetime-extended), invalidate the object itself,
// not only other objects reachable from it. This is necessary because the		// not only other objects reachable from it. This is necessary because the
// destructor has access to the temporary object after the call.		// destructor has access to the temporary object after the call.
// TODO: Support placement arguments once we start		// TODO: Support placement arguments once we start
▲ Show 20 Lines • Show All 1,105 Lines • Show Last 20 Lines

cfe/trunk/test/Analysis/call-invalidation.cpp

	Show First 20 Lines • Show All 126 Lines • ▼ Show 20 Lines

	void useAnything(void *);			void useAnything(void *);
	void useAnythingConst(const void *);			void useAnythingConst(const void *);

	void testInvalidationThroughBaseRegionPointer() {			void testInvalidationThroughBaseRegionPointer() {
	PlainStruct s1;			PlainStruct s1;
	s1.x = 1;			s1.x = 1;
	s1.z = 1;			s1.z = 1;
				s1.y = 1;
	clang_analyzer_eval(s1.x == 1); // expected-warning{{TRUE}}			clang_analyzer_eval(s1.x == 1); // expected-warning{{TRUE}}
	clang_analyzer_eval(s1.z == 1); // expected-warning{{TRUE}}			clang_analyzer_eval(s1.z == 1); // expected-warning{{TRUE}}
	// Not only passing a structure pointer through const pointer parameter,			// Not only passing a structure pointer through const pointer parameter,
	// but also passing a field pointer through const pointer parameter			// but also passing a field pointer through const pointer parameter
	// should preserve the contents of the structure.			// should preserve the contents of the structure.
	useAnythingConst(&(s1.y));			useAnythingConst(&(s1.y));
				clang_analyzer_eval(s1.y == 1); // expected-warning{{TRUE}}
	clang_analyzer_eval(s1.x == 1); // expected-warning{{TRUE}}			clang_analyzer_eval(s1.x == 1); // expected-warning{{TRUE}}
	// FIXME: Should say "UNKNOWN", because it is not uncommon to			// FIXME: Should say "UNKNOWN", because it is not uncommon to
	// modify a mutable member variable through const pointer.			// modify a mutable member variable through const pointer.
	clang_analyzer_eval(s1.z == 1); // expected-warning{{TRUE}}			clang_analyzer_eval(s1.z == 1); // expected-warning{{TRUE}}
	useAnything(&(s1.y));			useAnything(&(s1.y));
	clang_analyzer_eval(s1.x == 1); // expected-warning{{UNKNOWN}}			clang_analyzer_eval(s1.x == 1); // expected-warning{{TRUE}}
				clang_analyzer_eval(s1.y == 1); // expected-warning{{UNKNOWN}}
	}			}


	void useFirstConstSecondNonConst(const void x, void y);			void useFirstConstSecondNonConst(const void x, void y);
	void useFirstNonConstSecondConst(void x, const void y);			void useFirstNonConstSecondConst(void x, const void y);

	void testMixedConstNonConstCalls() {			void testMixedConstNonConstCalls() {
	PlainStruct s2;			PlainStruct s2;
	Show All 13 Lines

cfe/trunk/test/Analysis/cxx-uninitialized-object.cpp

	Show First 20 Lines • Show All 352 Lines • ▼ Show 20 Lines

	template <class T>			template <class T>
	void mayInitialize(T &);			void mayInitialize(T &);

	template <class T>			template <class T>
	void wontInitialize(const T &);			void wontInitialize(const T &);

	class PassingToUnknownFunctionTest1 {			class PassingToUnknownFunctionTest1 {
	int a, b;			int a, b; // expected-note{{uninitialized field 'this->b'}}

	public:			public:
	PassingToUnknownFunctionTest1() {			PassingToUnknownFunctionTest1() {
	mayInitialize(a);			mayInitialize(a);
	mayInitialize(b);			mayInitialize(b);
	// All good!			// All good!
	}			}

	PassingToUnknownFunctionTest1(int) {			PassingToUnknownFunctionTest1(int) {
	mayInitialize(a);			mayInitialize(a); // expected-warning{{1 uninitialized field at the end of the constructor call}}
	// All good!
	}			}

	PassingToUnknownFunctionTest1(int, int) {			PassingToUnknownFunctionTest1(int, int) {
	mayInitialize(*this);			mayInitialize(*this);
	// All good!			// All good!
	}			}
	};			};

	▲ Show 20 Lines • Show All 752 Lines • Show Last 20 Lines

cfe/trunk/test/Analysis/malloc.c

Show First 20 Lines • Show All 1,752 Lines • ▼ Show 20 Lines	struct IntAndPtr {
int *p;		int *p;
};		};

void constEscape(const void *ptr);		void constEscape(const void *ptr);

void testConstEscapeThroughAnotherField() {		void testConstEscapeThroughAnotherField() {
struct IntAndPtr s;		struct IntAndPtr s;
s.p = malloc(sizeof(int));		s.p = malloc(sizeof(int));
constEscape(&(s.x)); // could free s->p!		constEscape(&(s.x));
} // no-warning		} // expected-warning {{Potential leak of memory pointed to by 's.p'}}

// PR15623		// PR15623
int testNoCheckerDataPropogationFromLogicalOpOperandToOpResult(void) {		int testNoCheckerDataPropogationFromLogicalOpOperandToOpResult(void) {
char *param = malloc(10);		char *param = malloc(10);
char *value = malloc(10);		char *value = malloc(10);
int ok = (param && value);		int ok = (param && value);
free(param);		free(param);
free(value);		free(value);
▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

cfe/trunk/test/Analysis/taint-generic.c

Show First 20 Lines • Show All 225 Lines • ▼ Show 20 Lines	union {
int x;		int x;
char y[4];		char y[4];
} tainted;		} tainted;

char buffer[4];		char buffer[4];

int sock = socket(AF_INET, SOCK_STREAM, 0);		int sock = socket(AF_INET, SOCK_STREAM, 0);
read(sock, &tainted.y, sizeof(tainted.y));		read(sock, &tainted.y, sizeof(tainted.y));
		tainted.x = 0;
// FIXME: overlapping regions aren't detected by isTainted yet		// FIXME: overlapping regions aren't detected by isTainted yet
__builtin_memcpy(buffer, tainted.y, tainted.x);		__builtin_memcpy(buffer, tainted.y, tainted.x);
}		}

int testDivByZero() {		int testDivByZero() {
int x;		int x;
scanf("%d", &x);		scanf("%d", &x);
return 5/x; // expected-warning {{Division by a tainted value, possibly zero}}		return 5/x; // expected-warning {{Division by a tainted value, possibly zero}}
▲ Show 20 Lines • Show All 48 Lines • Show Last 20 Lines

cfe/trunk/test/Analysis/taint-tester.c

Show First 20 Lines • Show All 45 Lines • ▼ Show 20 Lines	void taintTracking(int x) {
int ptrtx = xyPtr->x;// expected-warning + {{tainted}}		int ptrtx = xyPtr->x;// expected-warning + {{tainted}}
int ptrty = xyPtr->y;// expected-warning + {{tainted}}		int ptrty = xyPtr->y;// expected-warning + {{tainted}}

// Taint on fields of a struct.		// Taint on fields of a struct.
struct XYStruct xy = {2, 3, 11};		struct XYStruct xy = {2, 3, 11};
scanf("%d", &xy.y);		scanf("%d", &xy.y);
scanf("%d", &xy.x);		scanf("%d", &xy.x);
int tx = xy.x; // expected-warning + {{tainted}}		int tx = xy.x; // expected-warning + {{tainted}}
int ty = xy.y; // FIXME: This should be tainted as well.		int ty = xy.y; // expected-warning + {{tainted}}
char ntz = xy.z;// no warning		char ntz = xy.z;// no warning
// Now, scanf scans both.		// Now, scanf scans both.
scanf("%d %d", &xy.y, &xy.x);		scanf("%d %d", &xy.y, &xy.x);
int ttx = xy.x; // expected-warning + {{tainted}}		int ttx = xy.x; // expected-warning + {{tainted}}
int tty = xy.y; // expected-warning + {{tainted}}		int tty = xy.y; // expected-warning + {{tainted}}
}		}

void BitwiseOp(int in, char inn) {		void BitwiseOp(int in, char inn) {
▲ Show 20 Lines • Show All 136 Lines • Show Last 20 Lines