This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Treat write into a top-level parameter variable with destructor as escape.
ClosedPublic

Authored by NoQ on Apr 1 2019, 8:54 PM.

Details

Summary

Writing stuff into an argument variable is usually equivalent to writing stuff to a local variable: it will have no effect outside of the function. There's an important exception from this rule: if the argument variable has a non-trivial destructor, the destructor would be invoked on the parent stack frame, exposing contents of the otherwise dead argument variable to the caller.

We've had this problem in https://bugs.llvm.org/show_bug.cgi?id=37459#c3 where we weren't invalidating argument regions after the call. Such invalidation is completely unnecessary when the argument doesn't have a destructor, but it's vital when it does.

The newly added test case demonstrates the same problem but "inside out": when we're receiving an object with a non-trivial destructor as a top-level argument, we're exposing ourselves to the destructor call of this variable which we won't ever encounter during the current analysis because it'll only happen in the parent stack frame. Such destructor may do various stuff with values we put into the variable, such as deallocating memory owned by the object, but we won't see it and report spurious leaks.

Note that the parameter variable is dead after it's referenced for the last time within the function regardless of whether it has a destructor or not. The variable is dead because we can guarantee that we'll never be able to access it throughout the rest of the analysis. It indicates that all our knowledge about the variable is final. For example, if there's a pointer stored in this variable that's allocated, and it's not stored anywhere else, it won't be deallocated until the end of the analysis. This is why it is incorrect to simply make top-level parameter variables with destructors live forever: it contradicts the performance-related purpose of dead symbol collection, even if it does play nicely with the leak-finding purpose of dead symbols collection.

Therefore i believe that the right solution is to treat any writes into top-level parameters with destructors as escapes. The value is still stored there and something that's beyond our control (in this case, a destructor call) will happen to it and we cannot predict what exactly will happen. It's a typical escape scenario.

Well, in fact, we *can* predict what happens. After all, it happens immediately after the end of the analysis and we don't need to know anything about the caller stack frame in order to evaluate these destructors. So the right right solution is to just append destructor modeling to the end of the analysis. This, however, is going to be very hard to implement because you'll have to teach the analyzer how to behave correctly with a null location context - that's going to be a looooot of crashes to sort out.

Diff Detail

Repository
rC Clang

Event Timeline

NoQ created this revision.Apr 1 2019, 8:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2019, 8:54 PM

Woah, the code looks amazing, cheers on the refactoring! I'll be honest, I'm struggling a bit with the sentence "we're now in the top frame". In order, I don't understand what does

  • we
  • now
  • in the top frame

mean. "Top-level argument" is another one -- Do we have precise definitions for there terms?

clang/test/Analysis/malloc.cpp
151 ↗(On Diff #193229)

Is this relevant? name will never be null.

NoQ marked an inline comment as done.EditedApr 2 2019, 2:30 PM

Woah, the code looks amazing, cheers on the refactoring! I'll be honest, I'm struggling a bit with the sentence "we're now in the top frame". In order, I don't understand what does

  • we
  • now
  • in the top frame

mean. "Top-level argument" is another one -- Do we have precise definitions for there terms?

Cf. LocationContext::inTopFrame().

The top frame is the StackFrameContext whose parent context is nullptr. There is only one such context and it is the root of the location context tree. It corresponds to the stack frame from which the current Analysis begins. That is, each such context corresponds to a path-sensitive analysis line in the -analyzer-display-progress dump. Other stack frame contexts (with non-null parents) correspond to function calls *during* analysis. They usually correspond to stack frames of inlined calls (since D49443 we sometimes create stack frame contexts for calls that will never be inlined, but the rough idea is still the same).

So this patch talks about the situation when we start analysis from function, say, test(A a) { ... }, and its argument a exists throughout the whole analysis: its constructor was called before the analysis has started, and its destructor will be called after the analysis has ended. Having no parent context means that we don't know anything about the caller or the call site of test(a) - the information we usually do have for inlined calls.

The phrase "We are now in the top frame" therefore roughly translates to "The CoreEngine worklist item that the ExprEngine is currently processing corresponds to an ExplodedNode that has a ProgramPoint with a LocationContext whose nearest StackFrameContext is the top frame". When i'm talking about top-level argument variables, i mean "A VarRegion whose parent region is a StackArgumentsSpaceRegion whose identity is a top-frame StackFrameContext".

Do you think i should document it somehow?

clang/test/Analysis/malloc.cpp
151 ↗(On Diff #193229)

Not really, just makes the code look a bit more sensible and idiomatic and less warning-worthy-anyway, to make it as clear as possible that the positive here is indeed false. We don't really have a constructor in this class, but we can imagine that it zero-initializes name. Without this check calling getName() multiple times would immediately result in a leak.

Szelethus accepted this revision.Apr 10 2019, 2:47 PM

Okay, I played around with this patch, I see now where this is going! LGTM!

Do you think i should document it somehow?

Aye, the description you gave was enlightening, thanks! If you can squeeze it somewhere in the code where it isn't out of place, it's all the better! :)

clang/test/Analysis/malloc.cpp
151 ↗(On Diff #193229)

Convinced ;)

This revision is now accepted and ready to land.Apr 10 2019, 2:47 PM
This revision was automatically updated to reflect the committed changes.