This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][discussion] Talk about escapes
Needs ReviewPublic

Authored by xazax.hun on Dec 4 2019, 6:29 PM.

Details

Summary

I was running the FuchsiaHandleCheck over some code and tried to collect and distill some false positives due to symbol escaping not working properly for integers.

After writing down all those test cases I feel like it is not always trivial what to do, so I decided to upload and annotate them so we can discuss here in depth what is the right way to deal with them.

Diff Detail

Event Timeline

xazax.hun created this revision.Dec 4 2019, 6:29 PM
xazax.hun marked 7 inline comments as done.Dec 4 2019, 6:42 PM
xazax.hun added inline comments.
clang/test/Analysis/fuchsia_handle.cpp
211

This one and its variants were very popular in Fuchsia.

232

Here we will give a diagnoses for a leak at the moment. There are similar patterns in Fuchsia, but it is a false positive there, because o does store the handle and close it in its dtor.

But if we cannot inline the getter, the analyzer just assumes that this code leaks.

It is really hard to tell what should we do in this case. If we really always escape the returned symbol, we might basically end up escaping every return value from every method.

243

This is the most basic case that is not working out of the box.

261

This one works.

268

This one also works.

277

This one also works.

301

This is also nasty.

NoQ added inline comments.Dec 4 2019, 6:51 PM
clang/test/Analysis/fuchsia_handle.cpp
210

This has nothing to do with symbolic regions. We can run into this problem even if it's a local variable in the current stack frame:

void foo() {
  zx_handle_t sa, sb;
  escape(&sb); // Escape *before* create!!

  zx_channel_create(0, &sa, &sb);
  zx_handle_close(sa);
  close_escaped();
}

The solution that'll obviously work would be to keep track of all regions that escaped at least once, and then not even start tracking the handle if it's getting placed into a region that causes an escape when written into or has itself escaped before, but that sounds like a huge overkill.

Lemme think. This sounds vaguely familiar but i can't immediately recall what my thoughts were last time i thought about it.

NoQ added inline comments.Dec 4 2019, 6:57 PM
clang/test/Analysis/fuchsia_handle.cpp
210

$ cat test.c

void manage(void **x);
void free_managed();

void foo() {
  void *x;
  manage(&x);
  x = malloc(1);
  free_managed();
}

$ clang --analyze test.c

test.c:8:3: warning: Potential leak of memory pointed to by 'x'
  free_managed();
  ^~~~~~~~~~~~~~
1 warning generated.

Sigh.

xazax.hun marked an inline comment as done.Dec 4 2019, 7:03 PM
xazax.hun added inline comments.
clang/test/Analysis/fuchsia_handle.cpp
210

Oh, I see. Yeah this one will be fun to deal with

NoQ added inline comments.Dec 4 2019, 7:16 PM
clang/test/Analysis/fuchsia_handle.cpp
210

The rules are pretty easy though, right?

2680 // A value escapes in four possible cases:
2681 // (1) We are binding to something that is not a memory region.
2682 // (2) We are binding to a MemRegion that does not have stack storage.
2683 // (3) We are binding to a top-level parameter region with a non-trivial
2684 //     destructor. We won't see the destructor during analysis, but it's there.
2685 // (4) We are binding to a MemRegion with stack storage that the store
2686 //     does not understand.
2687 ProgramStateRef
2688 ExprEngine::processPointerEscapedOnBind(ProgramStateRef State, SVal Loc,
2689                                         SVal Val, const LocationContext *LCtx) {

Basically, locals are the only special case; writing into anything else should be an immediate escape.

We could easily update this procedure to additionally keep track of all escaped locals in the program state, and then escape all binds to previously escaped locals as well.

The checker would then have to follow the same rules.

In the worst case, manually.

But i think we should instead update the checkPointerEscape() callback to escape the values of out-parameters upon evaluating the call conservatively (if it doesn't already) and then teach the checker to mark escaped regions explicitly as escaped (as opposed to removing them from the program state), and then teach it to never transition from escaped to opened. That would be cleaner because that'll only require us to hardcode the escaping procedure once.

Or we could just make the "bool doesWriteIntoThisRegionCauseAnEscape?(Region, State)" function re-usable.

xazax.hun marked an inline comment as done.Dec 5 2019, 10:05 AM
xazax.hun added inline comments.
clang/test/Analysis/fuchsia_handle.cpp
210

Yeah. I wonder if this procedure is the right place though. We do not actually see a bind in the code above.

xazax.hun marked an inline comment as done.Dec 5 2019, 10:26 AM
xazax.hun added inline comments.
clang/test/Analysis/fuchsia_handle.cpp
210

Hmm, in the stack example we do see the point of invalidation (which results in an escape). That make things easier, checkers could even work that problem around if they wanted to. In the original example, however, there is no point of invalidation, the region we get is already escaped without seeing any bind later on. And there is no store into the escaped region, since zx_channel_create is evaluated conservatively, we just attach the state to the conjured symbol retroactively.

So at this point I wonder if the checker should ever track any symbols that are is bound to a non-stack region (even if we do not see the bind itself). This might circumvent most of our problems?

And for the stack example we can just make the escaped state explicit again and never transition from an escaped state to a non-escaped one.

WDYT?

xazax.hun marked an inline comment as done.Dec 5 2019, 11:04 AM
xazax.hun added inline comments.
clang/test/Analysis/fuchsia_handle.cpp
210

Actually, thinking a bit more, this is more of a workaround. It does not matter if the region is on the stack or on the heap. What does matter if we did see the inception of the region (i.e.: the allocation) or not. If we store a handle on the heap but we did see the it from the very beginning we should still be able to track it properly.

NoQ added inline comments.Dec 5 2019, 1:11 PM
clang/test/Analysis/fuchsia_handle.cpp
210

Hmm, in the stack example we do see the point of invalidation (which results in an escape). That make things easier, checkers could even work that problem around if they wanted to. In the original example, however, there is no point of invalidation, the region we get is already escaped without seeing any bind later on.

When in doubt, think what would have happened during concrete execution. There *is* a point for escape-on-bind in both cases *during* the evaluation of zx_channel_create(), we're just not invoking checkPointerEscape on this point, but we should be.

Indeed, the mental model of behavior of any function that returns a value through an out-parameter would be:

  1. Come up with the value of the parameter.
  2. Write it into the provided region.

Step 2 is basically a store bind and so it should be triggering escape-on-bind - either because we're writing into an unknown memory space, or because we're writing into a "pre-escaped local" (a notion i just came up with in order to explain my example with the local). However, because the call is evaluated conservatively, these two steps are merged together and happen simultaneously. To make things worse, the first checker callback in which the value becomes available (checkPostCall) is invoked *after* the actual step 2. This leads to the paradox that we are already past step 2 when the checker is just trying to evaluate step 1.

So, our options are either to model step 2 manually in the checker, or to teach ExprEngine to trigger checkPointerEscape immediately after checkPostCall with all out-parameter values as roots.

xazax.hun marked an inline comment as done.Dec 5 2019, 1:30 PM
xazax.hun added inline comments.
clang/test/Analysis/fuchsia_handle.cpp
210

I think your proposal makes sense. My only concern is understandability. How do we explain this to the users? Yeah, you just got your symbols and they are already escaped even though no one touched them since their inception. But otherwise triggering PointerEscape for out params/return values after a conservative evaluation makes sense to me. It feels a bit like redundant work and undoing what the checker just did :) But I cannot think of a better way for now.

NoQ added inline comments.Dec 5 2019, 1:51 PM
clang/test/Analysis/fuchsia_handle.cpp
301

Cf.:

int *x = malloc(sizeof(int));
if (!x) {
  // leak???
}

Basically, you can suppress the leak warning if the symbol is constrained to a constant.

On the other hand, in your case you might instead want to track concrete handles. I.e., you could change your map from "SymbolRef => HandleState" to "SVal => HandleState", update it whenever a symbol collapses to a constant and as such becomes dead, and use linear lookup with evalEq() instead of map lookup. Or you could add a separate map for concrete ints if you're sure that other kinds of concrete values will never appear (you could insist to have some fun with statements like x == y where x is a handle symbol and y is a LocAsInteger, but loc-as-integers shouldn't have been introduced to begin with).

xazax.hun marked an inline comment as done.Dec 5 2019, 2:29 PM
xazax.hun added inline comments.
clang/test/Analysis/fuchsia_handle.cpp
210

So, our options are either to model step 2 manually in the checker, or to teach ExprEngine to trigger checkPointerEscape immediately after checkPostCall with all out-parameter values as roots.

Hmm, coming back to your example:

void manage(void **x);
void free_managed();

void foo() {
  void *x;
  manage(&x);
  x = malloc(1);
  free_managed();
}

So now have an escaped symbol BEFORE the checker starts to track it. After that, we call malloc and end up with a new symbol. So even if we trigger checkPointerEscape after manage, the checker needs some special handling. Or do you mean also triggering it after malloc call as well when we do the store? So it looks like storing a set of escaped regions in the core is kind of inevitable? Also, the reason why I really like this example, since for some functions we do know that the returned value is not saved elsewhere. Malloc is a great example. Should we have a whitelist for those or should we just check if something is a system call? Or should we have some other heuristic?

But at least this way the checker might not need to store escaped state for untracked symbols (i.e. when a symbol is not stored in the GDM yet and escape is the first transition) because it will be renotified after each store.

NoQ added inline comments.Dec 5 2019, 2:40 PM
clang/test/Analysis/fuchsia_handle.cpp
210

So it looks like storing a set of escaped regions in the core is kind of inevitable?

Yeah, i think so. But that's a very small list because we only really need to put local variables into that list, and we can clean it up as soon as the variable goes out of scope.

More formally, i propose the following:

  1. Definition: A VarRegion is called pre-escaped if it resides in a StackSpaceRegion and there exists a prior point on the current execution path in which it pointer-escaped. This is, obviously, a path-sensitive property, so in order to be able to figure out whether a given variable is pre-escaped we need a state trait.
  2. Then i think we need to amend the list of four kinds of escapy regions that i posted above with the fifth rule: "(5) We are binding to a pre-escaped VarRegion".

Once this is change is implemented, this example starts working exactly like the SymRegion example. In the malloc() example, write of the malloced pointer into x is going to trigger an escape because x is a pre-escaped VarRegion. In the handle example, zx_channel_create() into &sb would cause an escape (assuming you implement the checkPointerEscape-for-out-parameters-thing) because sb is a pre-escaped VarRegion.

NoQ added inline comments.Dec 5 2019, 3:31 PM
clang/test/Analysis/fuchsia_handle.cpp
210

How do we explain this to the users? Yeah, you just got your symbols and they are already escaped even though no one touched them since their inception

The users hopefully won't notice. If they do notice, they'll probably like it. If they don't, we should probably introduce a new escape kind so that they could opt out (i.e., a new item in enum PointerEscapeKind).

It feels a bit like redundant work

With my proposal we might actually end up escaping the same symbol twice: once during conservative evaluation, once in the newly introduced checkPointerEscape invocation immediately after post-call. But i think it could be implemented to avoid this double work: only do invalidation during conservativeEvalCall but don't trigger escapes but instead remember the list of escaped symbols for later use, and then invoke checkPointerEscape in a delayed manner after checkPostCall.

This should work correctly because checkers aren't really allowed to mutate the Store in checkPostCall (we should assert that). That said, storing the list of escaped symbols for later use is going to be pretty annoying.

Just a cross reference, some of the discussed changes are implemented here: https://reviews.llvm.org/D71152

Hopefully, more will follow.