This is an archive of the discontinued LLVM Phabricator instance.

[WIP][Analyzer] find stack addresses leaked via out-params
Needs ReviewPublic

Authored by jkorous on Sep 16 2020, 3:13 PM.

Details

Summary

This is path-insensitive purely AST-based check.

At this point I'd like to get some feedback to decide whether I should polish this a bit more (split from the path-sensitive checker?, maybe a different warning?, ...) and land it as path-insensitive checker or if I should explore the path-sensitive option.

I've started working on this to gain more experience with checkers implementation - my familiarity with path-sensitive checkers is still pretty limited. My quick naive attempt of getting SVal of DeclRefExpr and checking if its MemRegion is in current stack frame didn't get very far because the returned SVal kind is undefined. My expectation (just a wild guess really) was that local variables in the current stack would be readily available as SVal-s. I'm going to explore a bit more (hoping I won't have to add every local variable defined to ProgramState for this to work).

Diff Detail

Event Timeline

jkorous created this revision.Sep 16 2020, 3:13 PM
jkorous requested review of this revision.Sep 16 2020, 3:14 PM

There are some immediate observation to the patch, like a new kind of check definitely warrants a different warning message. However, I see that this is WIP patch, so I'll try to keep this high level.

At this point I'd like to get some feedback to decide whether I should polish this a bit more (split from the path-sensitive checker?, maybe a different warning?, ...) and land it as path-insensitive checker or if I should explore the path-sensitive option.

I think this is more of a flowsensitive problem, either with dataflow (Does this assignment to this out-param reach the exit block of the function?) or preferably, as you suggest, symbolic execution (Do we exit this function with the out-parameter still being assigned to a local variable?).

I've started working on this to gain more experience with checkers implementation - my familiarity with path-sensitive checkers is still pretty limited.

Generally speaking, we solve AST problems with either an AST matcher or an AST visitor. There is a lot of smart built into them that I feel you're reimplementing here.

My quick naive attempt of getting SVal of DeclRefExpr and checking if its MemRegion is in current stack frame didn't get very far because the returned SVal kind is undefined. My expectation (just a wild guess really) was that local variables in the current stack would be readily available as SVal-s. I'm going to explore a bit more (hoping I won't have to add every local variable defined to ProgramState for this to work).

Its been a while since I implemented a pathsensitive checker, unfortunately :) You may have to wait for someone more knowledgeable to get these interesting question answered.

clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
318–329

Is there a reason behind you peeling an expression apart to find this information? It looks like you're returning the level of pointers of the type associated with the expression. Would this work?:

int getIndirectionCount(QualType T) {
  int Ret = 0;
  while (T = T->getPointeeType())
    Ret++;
  return Ret;
}
NoQ added a comment.EditedSep 18 2020, 12:30 AM

My quick naive attempt of getting SVal of DeclRefExpr and checking if its MemRegion is in current stack frame didn't get very far because the returned SVal kind is undefined.

This is roughly the right approach (though i don't fully understand why would you consider DeclRefExprs specifically). I strongly recommend debugging it rather than abandoning; your entire patch may get reduced to like 10 lines of code if done right, and it'll additionally find more bugs. It looks like either an lvalue vs. rvalue problem (you accidentally computed an unnecessary lvalue-to-rvalue cast on an uninitialized stack address) and/or timing bug (you computed a value of the variable *when* it was uninitialized; you'll get different answers in different moments of time during analysis and that's the whole point).

jkorous updated this revision to Diff 293343.Sep 21 2020, 10:56 PM

Simply check memory regions of LHS and RHS in assignments instead of AST pattern matching.

Thank you both for all the hints!

I was almost positive that I tried this trivial approach before and it didn't work (unknown memory region) - that's why I ended up with the previous messy implementation...

Now, the current implementation still has false positives in the sense that it warns at every suspicious assignment without checking what is the actual state of LHS of such assignment at the point of return from the function.

I wanted to implement what @Szelethus suggested and thought this trivial change might be all that's necessary after all (plus another patch with better diagnostics):
https://reviews.llvm.org/D88070
However it doesn't work - the CallBack::HandleBinding callback in StackAddrEscapeChecker::checkEndFunction gets called for like 1 out of 10 cases of what I would expect (although the checkEndFunction method is called).

Does that mean that the bindings aren't present when this method is called? @NoQ could you please take a look - am I holding it wrong?

clang/lib/StaticAnalyzer/Checkers/StackAddrEscapeChecker.cpp
299

I need to remove this outdated note for myself.

303

I need to rename this.

Friendly ping

NoQ added a comment.Sep 28 2020, 10:07 AM

Now, the current implementation still has false positives in the sense that it warns at every suspicious assignment without checking what is the actual state of LHS of such assignment at the point of return from the function.

That's an interesting balancing act. I can imagine correct code intentionally written like that but I also wouldn't mind this checker having an aggressive mode that warns immediately at assignment because that'll make the checker much more efficient at dodging heap invalidations.

However it doesn't work - the CallBack::HandleBinding callback in StackAddrEscapeChecker::checkEndFunction gets called for like 1 out of 10 cases of what I would expect (although the checkEndFunction method is called).

Does that mean that the bindings aren't present when this method is called? @NoQ could you please take a look - am I holding it wrong?

You can always dump() the ProgramState to see which bindings are present. You can also view the Exploded Graph to see how these bindings change over time (it's basically a dump of the state for every moment of time during analysis), i.e. if something's not present you can find out when (and ultimately why) it was removed (or never added when supposed to have been).

NoQ added a comment.Sep 28 2020, 10:13 AM

I also recommend using a specialized callback checkBind instead of pattern-matching binary operators specifically.