This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Escape symbols stored into specific region after a conservative evalcall.
ClosedPublic

Authored by xazax.hun on Dec 9 2019, 1:26 PM.

Details

Summary

This patch introduced additional PointerEscape callbacks after conservative calls for output parameters.
This should not really affect the current checkers but the upcoming FuchsiaHandleChecker relies on this heavily.

Diff Detail

Event Timeline

xazax.hun created this revision.Dec 9 2019, 1:26 PM
xazax.hun marked an inline comment as done.Dec 9 2019, 1:26 PM
xazax.hun added inline comments.
clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
409

Actually, it is possible I went too far with plumbing RegionAndSymbolInvalidationTraits.

xazax.hun marked an inline comment as done.Dec 9 2019, 1:29 PM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
715

After some offline conversation it is very likely that we want to move the runCheckersForPointerEscape here.

The main question is, how should we get all the data?

We should know about:

  • What regions are output params.
  • What regions are considered escaped.
  • What regions have traits that prevents escaping.

Is there anything else?

NoQ added inline comments.Dec 9 2019, 1:46 PM
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
715

What regions are output params.

That should be obvious from the AST. Like, parameters of non-const pointer/reference types.

What regions are considered escaped.

Output parameter regions (as TopLevelInvalidated) and whatever's reachable from them.

What regions have traits that prevents escaping.

Currently the only trait that affects escaping (as opposed to invalidation) is TK_SuppressEscape and it is never applied to out-parameters.

Also, it looks like for example CStringChecker calls ProgramState::InvalidateRegions directly when modeling function calls. So it looks like if checkers are using evalCall, pointerEscape will not be triggered automatically. I am not sure if that would be a bug or feature :D

NoQ added a comment.Dec 9 2019, 2:10 PM

Also, it looks like for example CStringChecker calls ProgramState::InvalidateRegions directly when modeling function calls. So it looks like if checkers are using evalCall, pointerEscape will not be triggered automatically. I am not sure if that would be a bug or feature :D

I'd say it's a feature, because evalCall is the official way for checkers to opt out of babysitting.

xazax.hun updated this revision to Diff 232948.Dec 9 2019, 3:12 PM
  • Prototype a new approach.
xazax.hun marked 3 inline comments as done.Dec 9 2019, 3:15 PM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
622

How much do we care about the escape kind? For each symbol we need to check if it was directly passed to the callee. It is not too bad I guess, but I was wondering.

715

After some offline conversation it is very likely that we want to move the runCheckersForPointerEscape here.

Nope, we do not want to move it here. We want the pointerEscape to happen AFTER postCall callbacks.

1127

Hmm. Should I introduce a very short lived trait or do we have a better way to deal with stuff like this? A short lived map?

NoQ added a comment.Dec 9 2019, 3:18 PM

I think you don't need to smuggle WasConservative all the way up. It's implied that if the evaluation was not conservative, then the respective ExplodedNodeSet is going to be empty, as all nodes will be put directly into the worklist instead. Eg., checkPostCall isn't going to be invoked immediately after inlineCall, but only after enqueueEndOfFunction.

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
622

Dunno, just introduce a new PSK_ item and use it here. It isn't supposed to be per-symbol, it's just to notify checkers that we're in this new post-call invocation for out-parameters, so that they could opt out of the whole callback.

xazax.hun updated this revision to Diff 232955.Dec 9 2019, 4:06 PM
  • Do not track explicitly if a call was conservative.
xazax.hun retitled this revision from [analyzer][WIP] Escape symbols stored into specific region after a conservative evalcall. to [analyzer] Escape symbols stored into specific region after a conservative evalcall. .Dec 9 2019, 4:06 PM
xazax.hun updated this revision to Diff 232968.Dec 9 2019, 4:57 PM
xazax.hun edited the summary of this revision. (Show Details)
  • Added a test. More rigorous tests will come in the FuchsiaHandleChecker.
  • Added a new PSK_ entry.
NoQ added inline comments.Dec 9 2019, 5:00 PM
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
604

WDYT about re-using ExprEngine::escapeValue() by changing it to accept ArrayRef<SVal> instead of a single SVal?

xazax.hun marked an inline comment as done.Dec 9 2019, 5:07 PM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
604

It is not entirely the same. Here we only collect symbols from non-stack regions. There (as far as I understand) we collect all symbols. I could add a flag but at that point I wonder if it is worth the change.

NoQ added inline comments.Dec 9 2019, 5:12 PM
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
604

Umm, wait, right, so what are you doing in this callback to begin with? The code says "gather all symbols that are encountered as immediate values stored in traversed regions". Why not simply "gather all symbols that are traversed from parameter regions"?

649

I guess technically, for our own sanity, it's worth it to-rescan the symbols for every node in dstPostCall. But i'll be very surprised if they are actually going to yield different results for every predecessor node.

xazax.hun marked an inline comment as done.Dec 9 2019, 5:34 PM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
604

My understanding is the following but correct me if I am wrong:

int *getConjuredSymbol();

call(getConjuredSymbol());

So we have can talk about two symbols here. One symbol is what was returned by getConjuredSymbol and the other is the pointee, the symbol that we get when we dereference the result of getConjuredSymbol. And my understanding is that we want to invoke escape for the latter and not the former. ExprEngine::escapeValue invalidates the former but not the latter. The visitor here invalidates the latter but not the former. This behavior solves most of my test cases in FuchsiaHandleChecker.

xazax.hun marked an inline comment as done.Dec 9 2019, 5:37 PM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
649

I think it is possible. We use the state to get the pointee of some pointers, so in case the PostCall splits the state on the value of the outputs it would be reasonable to get different results.

NoQ added inline comments.Dec 9 2019, 5:40 PM
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
604

How about escapeValue(getSVal(getArgSVal()))? (the type for getSVal() could be obtained from the AST).

xazax.hun marked an inline comment as done.Dec 9 2019, 5:45 PM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
604

Hmm, I believe that could work, thanks!

xazax.hun updated this revision to Diff 232982.Dec 9 2019, 6:29 PM
  • Reuse ExprEngine::escapeValue.
NoQ added inline comments.Dec 9 2019, 6:59 PM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/ExprEngine.h
637

Dunno, should we rename to escapeValues()?

clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
615–617

I encourage Call.parameters(). This way you won't need to obtain a FuncDecl. In fact you won't even need it to be a FunctionDecl.

625

Ok, so this patch interacts with D71152 in a non-trivial manner. We should re-use the logic that decides whether an escape on bind occurs.

xazax.hun marked an inline comment as done.Dec 10 2019, 11:25 AM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
625

Yeah, I just started to look into it and some skeletons fell out.

Let's consider the following example:

void leak() {
  zx_handle_t sa, sb;
  if (zx_channel_create(0, &sa, &sb))
    return;
  zx_handle_close(sb);
}

After using the same logic we can no longer detect the leak. The reason is, after zx_channel_create both sa and sb regions are escaped (and they become escaped during the conservative call) and then after the PostCall callback, when we attached the traits to our symbols, we will trigger pointer escape immediately, and lose the traits we just added.

I am not immediately sure what to do here. I feel like the main source of the problem is the fact that zx_channel_create should not escape anything and the checker has now way to communicate that to the analyzer core. Any ideas?

xazax.hun marked an inline comment as done.Dec 10 2019, 11:49 AM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
625

Suddenly, I see four ways forward, none of which is optimal. Hopefully, you can come up with something better.

  1. Do not care about escaped local regions in this code path. So basically we would tune the invalidation down a bit.
  2. Make it possible for modeling checkers to prevent escaping. It would be great if there would be a way to do that without doing EvalCall.
  3. Require the users to annotate out parameters. And we could assume that out parameters do not escape.
  4. Do some heuristics like if a checker just attached some info in a PostCall, it is likely that we should not escape that symbol.
xazax.hun marked an inline comment as done.Dec 10 2019, 12:14 PM
xazax.hun added inline comments.
clang/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp
625

Or:

  1. Since output arguments might be more common than escapes in practice, maybe we could just not escape local regions at all by default only if there is a special annotation or a modelling check asks for it. So users would get one more tool to suppress false positives but we would not overdo invalidation by default.
NoQ added a comment.Dec 10 2019, 1:06 PM

In any case, every checker is allowed to make their own decisions about escaping. Escape on its own is not material, it's all about how the checker reacts to escapes. Say, it's up to MallocChecker to decide whether the function may or may not release memory that escapes on call.

I think a valid approach would be to simply look up the function in your CallDescriptionMap and then abort the checkPointerEscape callback when it's found.

Yet, it annoys me a bit that we didn't make everything magically work in an "out of the box" manner. Can we eliminate the first pointer escape (that happens before PostCall) but only keep the secondary escape?

In D71224#1778179, @NoQ wrote:

In any case, every checker is allowed to make their own decisions about escaping. Escape on its own is not material, it's all about how the checker reacts to escapes. Say, it's up to MallocChecker to decide whether the function may or may not release memory that escapes on call.

I think a valid approach would be to simply look up the function in your CallDescriptionMap and then abort the checkPointerEscape callback when it's found.

Yet, it annoys me a bit that we didn't make everything magically work in an "out of the box" manner. Can we eliminate the first pointer escape (that happens before PostCall) but only keep the secondary escape?

I don't think this is a good enough model currently. The problem is that, it does not play well with annotations. E.g. the checker can see a symbol escaping, but it does not have a whole lot of information how. For example, currently, there is no way to check if the output parameter through which the escape happened was annotated somehow.

NoQ added a comment.Dec 10 2019, 1:39 PM

I don't think this is a good enough model currently. The problem is that, it does not play well with annotations. E.g. the checker can see a symbol escaping, but it does not have a whole lot of information how. For example, currently, there is no way to check if the output parameter through which the escape happened was annotated somehow.

Hmm. If the function is annotated, it is hopefully "fully" annotated, or at least the programmer doesn't mind adding more annotations to it. Given that you have your CallEvent structure in checkPointerEscape, i hope you can easily see if there are any annotations at all on the function, and if so, suppress the current escape entirely. Or at least scan the annotated parameters and suppress the escape for them.

I guess it's still a problem if the *same* handle is also passed through a parameter that *cannot* be annotated (eg., as part of a structure passed into the call) and then actually getting released inside the call, but is it a real problem for you?

In D71224#1778231, @NoQ wrote:

I don't think this is a good enough model currently. The problem is that, it does not play well with annotations. E.g. the checker can see a symbol escaping, but it does not have a whole lot of information how. For example, currently, there is no way to check if the output parameter through which the escape happened was annotated somehow.

Hmm. If the function is annotated, it is hopefully "fully" annotated, or at least the programmer doesn't mind adding more annotations to it. Given that you have your CallEvent structure in checkPointerEscape, i hope you can easily see if there are any annotations at all on the function, and if so, suppress the current escape entirely. Or at least scan the annotated parameters and suppress the escape for them.

I guess it's still a problem if the *same* handle is also passed through a parameter that *cannot* be annotated (eg., as part of a structure passed into the call) and then actually getting released inside the call, but is it a real problem for you?

Yeah, this was one of my idea as well. I think one of my main concerns is that I would except the majority of the escapes are simply being output parameters and only a minority are legitimate. So I was wondering if we got the default right. Maybe a checker should do more work to get the escaping rather than more work preventing it?

NoQ added a comment.Dec 10 2019, 2:08 PM

So I was wondering if we got the default right. Maybe a checker should do more work to get the escaping rather than more work preventing it?

But that's exactly how it works right now(?) If you don't define checkPointerEscape you get no escaping, if you do extra work of defining it you get the exact amount of escaping that you want.

In D71224#1778303, @NoQ wrote:

So I was wondering if we got the default right. Maybe a checker should do more work to get the escaping rather than more work preventing it?

But that's exactly how it works right now(?) If you don't define checkPointerEscape you get no escaping, if you do extra work of defining it you get the exact amount of escaping that you want.

So basically what I am wonder/worrying about is the following:
The analyzer core will decide that the stack region is escaped and the checkers has no word about this. And from that time on the checkers have to do extra work each time there is a store or conservative call to find out if this escape corresponds to a region that was escaped earlier unwillingly (from the checker's point of view).

I think I found the main problem with the current model, at least for the FuchsiaHandleCheck.

Consider the following two snippets:

zx_handle_t *get_handle_address();
void escape_store_to_escaped_region01() {
  zx_handle_t sb;
  if (zx_channel_create(0, get_handle_address(), &sb))
    return;
  zx_handle_close(sb);
}
void leak() {
  zx_handle_t sa, sb;
  if (zx_channel_create(0, &sa, &sb))
    return;
  zx_handle_close(sb);
}

In the first one I want the first handle to be escaped in the second one I do not want it to be escaped.

With my current proposed changes the checker will receive a pointer escape callback for both but it does not have enough info to differentiate between the two cases.

If I do not act upon this kind of escape I end up reporting a false positive in the first case. If I act on this escape I end up missing a true positive in the second case.

xazax.hun updated this revision to Diff 233200.Dec 10 2019, 3:00 PM
  • First take on trying to reuse processPointerEscapedOnBind.
NoQ added a comment.Dec 10 2019, 3:11 PM

So basically what I am wonder/worrying about is the following:
The analyzer core will decide that the stack region is escaped and the checkers has no word about this.

Yup, you got me. Pre-escaped locals are indeed material and beyond the checker's control. I don't seem to have any immediate solutions. I think we could postpone the work on pre-escaped locals (until we figure out how to do them correctly) if they're not immediately necessary to you (after all i was the one who suggested it). Or ignore the problem (depending on how we do our FP vs. FN trade-off).

Consider the following two snippets:

Mm, these snippets don't have pre-escaped locals. Like, they accidentally do, but above i proposed to work around this by removing the first escape invocation (that happens during the call) and only doing it after the call. This way these locals don't have time to become pre-escaped. I think these are not a problem.

xazax.hun updated this revision to Diff 233235.Dec 10 2019, 5:05 PM
  • Rebasing after reverting the pre-escaping.
NoQ accepted this revision.Dec 10 2019, 5:12 PM

Thanks!! Nothing controversial remains here, right? :)

clang/include/clang/StaticAnalyzer/Core/CheckerManager.h
91

All three escape-on-calls are on conservatively evaluated calls only, so this name isn't very descriptive.

Maybe PSK_EscapeOutParameters?

clang/lib/StaticAnalyzer/Core/ExprEngine.cpp
2736–2738

Dunno, make_pair?

This revision is now accepted and ready to land.Dec 10 2019, 5:12 PM
This revision was automatically updated to reflect the committed changes.