This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/StaticAnalyzer/Core/
-
StaticAnalyzer/
-
Core/
-
ExprEngine.cpp
-
test/Analysis/
-
Analysis/
1/3
malloc.cpp

Differential D60112

[analyzer] Treat write into a top-level parameter variable with destructor as escape.
ClosedPublic

Authored by NoQ on Apr 1 2019, 8:54 PM.

Download Raw Diff

Details

Reviewers

dcoughlin
xazax.hun
a_sidorin
rnkovacs
mikhail.ramalho
Szelethus
baloghadamsoftware
Charusso

Commits

rG7d4694547a6b: [analyzer] Escape pointers stored into top-level parameters with destructors.
rL358321: [analyzer] Escape pointers stored into top-level parameters with destructors.
rC358321: [analyzer] Escape pointers stored into top-level parameters with destructors.

Summary

Writing stuff into an argument variable is usually equivalent to writing stuff to a local variable: it will have no effect outside of the function. There's an important exception from this rule: if the argument variable has a non-trivial destructor, the destructor would be invoked on the parent stack frame, exposing contents of the otherwise dead argument variable to the caller.

We've had this problem in https://bugs.llvm.org/show_bug.cgi?id=37459#c3 where we weren't invalidating argument regions after the call. Such invalidation is completely unnecessary when the argument doesn't have a destructor, but it's vital when it does.

The newly added test case demonstrates the same problem but "inside out": when we're receiving an object with a non-trivial destructor as a top-level argument, we're exposing ourselves to the destructor call of this variable which we won't ever encounter during the current analysis because it'll only happen in the parent stack frame. Such destructor may do various stuff with values we put into the variable, such as deallocating memory owned by the object, but we won't see it and report spurious leaks.

Note that the parameter variable is dead after it's referenced for the last time within the function regardless of whether it has a destructor or not. The variable is dead because we can guarantee that we'll never be able to access it throughout the rest of the analysis. It indicates that all our knowledge about the variable is final. For example, if there's a pointer stored in this variable that's allocated, and it's not stored anywhere else, it won't be deallocated until the end of the analysis. This is why it is incorrect to simply make top-level parameter variables with destructors live forever: it contradicts the performance-related purpose of dead symbol collection, even if it does play nicely with the leak-finding purpose of dead symbols collection.

Therefore i believe that the right solution is to treat any writes into top-level parameters with destructors as escapes. The value is still stored there and something that's beyond our control (in this case, a destructor call) will happen to it and we cannot predict what exactly will happen. It's a typical escape scenario.

Well, in fact, we *can* predict what happens. After all, it happens immediately after the end of the analysis and we don't need to know anything about the caller stack frame in order to evaluate these destructors. So the right right solution is to just append destructor modeling to the end of the analysis. This, however, is going to be very hard to implement because you'll have to teach the analyzer how to behave correctly with a null location context - that's going to be a looooot of crashes to sort out.

Diff Detail

Event Timeline

NoQ created this revision.Apr 1 2019, 8:54 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2019, 8:54 PM

Herald added subscribers: cfe-commits, jdoerfert, dkrupp and 3 others. · View Herald Transcript

Woah, the code looks amazing, cheers on the refactoring! I'll be honest, I'm struggling a bit with the sentence "we're now in the top frame". In order, I don't understand what does

we
now
in the top frame

mean. "Top-level argument" is another one -- Do we have precise definitions for there terms?

clang/test/Analysis/malloc.cpp
151	Is this relevant? `name` will never be null.

In D60112#1451198, @Szelethus wrote:

Woah, the code looks amazing, cheers on the refactoring! I'll be honest, I'm struggling a bit with the sentence "we're now in the top frame". In order, I don't understand what does

we

now

in the top frame

mean. "Top-level argument" is another one -- Do we have precise definitions for there terms?

Cf. LocationContext::inTopFrame().

The top frame is the StackFrameContext whose parent context is nullptr. There is only one such context and it is the root of the location context tree. It corresponds to the stack frame from which the current Analysis begins. That is, each such context corresponds to a path-sensitive analysis line in the -analyzer-display-progress dump. Other stack frame contexts (with non-null parents) correspond to function calls *during* analysis. They usually correspond to stack frames of inlined calls (since D49443 we sometimes create stack frame contexts for calls that will never be inlined, but the rough idea is still the same).

So this patch talks about the situation when we start analysis from function, say, test(A a) { ... }, and its argument a exists throughout the whole analysis: its constructor was called before the analysis has started, and its destructor will be called after the analysis has ended. Having no parent context means that we don't know anything about the caller or the call site of test(a) - the information we usually do have for inlined calls.

The phrase "We are now in the top frame" therefore roughly translates to "The CoreEngine worklist item that the ExprEngine is currently processing corresponds to an ExplodedNode that has a ProgramPoint with a LocationContext whose nearest StackFrameContext is the top frame". When i'm talking about top-level argument variables, i mean "A VarRegion whose parent region is a StackArgumentsSpaceRegion whose identity is a top-frame StackFrameContext".

Do you think i should document it somehow?

clang/test/Analysis/malloc.cpp
151	Not really, just makes the code look a bit more sensible and idiomatic and less warning-worthy-anyway, to make it as clear as possible that the positive here is indeed false. We don't really have a constructor in this class, but we can imagine that it zero-initializes name. Without this check calling `getName()` multiple times would immediately result in a leak.

Okay, I played around with this patch, I see now where this is going! LGTM!

Do you think i should document it somehow?

Aye, the description you gave was enlightening, thanks! If you can squeeze it somewhere in the code where it isn't out of place, it's all the better! :)

clang/test/Analysis/malloc.cpp
151	Convinced ;)

This revision is now accepted and ready to land.Apr 10 2019, 2:47 PM

Closed by commit rC358321: [analyzer] Escape pointers stored into top-level parameters with destructors. (authored by NoQ). · Explain WhyApr 12 2019, 7:00 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Core/

ExprEngine.cpp

64 lines

test/

Analysis/

malloc.cpp

23 lines

Diff 193229

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp

Show First 20 Lines • Show All 2,617 Lines • ▼ Show 20 Lines	for (const auto I : AfterPreSet) {
State = State->BindExpr(AE, LCtx, ResultVal);		State = State->BindExpr(AE, LCtx, ResultVal);
Bldr.generateNode(AE, I, State, nullptr,		Bldr.generateNode(AE, I, State, nullptr,
ProgramPoint::PostStmtKind);		ProgramPoint::PostStmtKind);
}		}

getCheckerManager().runCheckersForPostStmt(Dst, AfterInvalidateSet, AE, *this);		getCheckerManager().runCheckersForPostStmt(Dst, AfterInvalidateSet, AE, *this);
}		}

// A value escapes in three possible cases:		// A value escapes in four possible cases:
// (1) We are binding to something that is not a memory region.		// (1) We are binding to something that is not a memory region.
// (2) We are binding to a MemrRegion that does not have stack storage.		// (2) We are binding to a MemRegion that does not have stack storage.
// (3) We are binding to a MemRegion with stack storage that the store		// (3) We are binding to a top-level parameter region with a non-trivial
		// destructor. We won't see the destructor during analysis, but it's there.
		// (4) We are binding to a MemRegion with stack storage that the store
// does not understand.		// does not understand.
ProgramStateRef ExprEngine::processPointerEscapedOnBind(ProgramStateRef State,		ProgramStateRef
SVal Loc,		ExprEngine::processPointerEscapedOnBind(ProgramStateRef State, SVal Loc,
SVal Val,		SVal Val, const LocationContext *LCtx) {
const LocationContext *LCtx) {
// Are we storing to something that causes the value to "escape"?
bool escapes = true;

// TODO: Move to StoreManager.		// Cases (1) and (2).
if (Optional<loc::MemRegionVal> regionLoc = Loc.getAs<loc::MemRegionVal>()) {		const MemRegion *MR = Loc.getAsRegion();
escapes = !regionLoc->getRegion()->hasStackStorage();		if (!MR \|\| !MR->hasStackStorage())
		return escapeValue(State, Val, PSK_EscapeOnBind);
if (!escapes) {
// To test (3), generate a new state with the binding added. If it is		// Case (3).
// the same state, then it escapes (since the store cannot represent		if (const auto *VR = dyn_cast<VarRegion>(MR->getBaseRegion()))
// the binding).		if (VR->hasStackParametersStorage() && VR->getStackFrame()->inTopFrame())
		if (const auto *RD = VR->getValueType()->getAsCXXRecordDecl())
		if (!RD->hasTrivialDestructor())
		return escapeValue(State, Val, PSK_EscapeOnBind);

		// Case (4): in order to test that, generate a new state with the binding
		// added. If it is the same state, then it escapes (since the store cannot
		// represent the binding).
// Do this only if we know that the store is not supposed to generate the		// Do this only if we know that the store is not supposed to generate the
// same state.		// same state.
SVal StoredVal = State->getSVal(regionLoc->getRegion());		SVal StoredVal = State->getSVal(MR);
if (StoredVal != Val)		if (StoredVal != Val)
escapes = (State == (State->bindLoc(*regionLoc, Val, LCtx)));		if (State == (State->bindLoc(loc::MemRegionVal(MR), Val, LCtx)))
}		return escapeValue(State, Val, PSK_EscapeOnBind);
}

// If our store can represent the binding and we aren't storing to something
// that doesn't have local storage then just return and have the simulation
// state continue as is.
if (!escapes)
return State;

// Otherwise, find all symbols referenced by 'val' that we are tracking
// and stop tracking them.
State = escapeValue(State, Val, PSK_EscapeOnBind);
return State;		return State;
}		}

ProgramStateRef		ProgramStateRef
ExprEngine::notifyCheckersOfPointerEscape(ProgramStateRef State,		ExprEngine::notifyCheckersOfPointerEscape(ProgramStateRef State,
const InvalidatedSymbols *Invalidated,		const InvalidatedSymbols *Invalidated,
ArrayRef<const MemRegion *> ExplicitRegions,		ArrayRef<const MemRegion *> ExplicitRegions,
const CallEvent *Call,		const CallEvent *Call,
▲ Show 20 Lines • Show All 448 Lines • Show Last 20 Lines

clang/test/Analysis/malloc.cpp

Show First 20 Lines • Show All 135 Lines • ▼ Show 20 Lines	char* test_cxa_demangle(const char* sym) {
char* funcname = (char*)malloc(funcnamesize);		char* funcname = (char*)malloc(funcnamesize);
int status;		int status;
char* ret = abi::__cxa_demangle(sym, funcname, &funcnamesize, &status);		char* ret = abi::__cxa_demangle(sym, funcname, &funcnamesize, &status);
if (status == 0) {		if (status == 0) {
funcname = ret;		funcname = ret;
}		}
return funcname; // no-warning		return funcname; // no-warning
}		}

		namespace argument_leak {
		class A {
		char *name;

		public:
		char *getName() {
		if (!name) {
		SzelethusUnsubmitted Not Done Reply Inline Actions Is this relevant? `name` will never be null. Szelethus: Is this relevant? `name` will never be null.
		NoQAuthorUnsubmitted Done Reply Inline Actions Not really, just makes the code look a bit more sensible and idiomatic and less warning-worthy-anyway, to make it as clear as possible that the positive here is indeed false. We don't really have a constructor in this class, but we can imagine that it zero-initializes name. Without this check calling `getName()` multiple times would immediately result in a leak. NoQ: Not really, just makes the code look a bit more sensible and idiomatic and less warning-worthy…
		SzelethusUnsubmitted Not Done Reply Inline Actions Convinced ;) Szelethus: Convinced ;)
		name = static_cast<char *>(malloc(10));
		}
		return name;
		}
		~A() {
		if (name) {
		delete[] name;
		}
		}
		};

		void test(A a) {
		(void)a.getName();
		}
		} // namespace argument_leak