This is an archive of the discontinued LLVM Phabricator instance.

[clang][analyzer] Using CallDescription in StreamChecker.
ClosedPublic

Authored by balazske on Sep 18 2019, 6:47 AM.

Details

Summary

Recognization of function names is done now with the CallDescription
class instead of using IdentifierInfo. This means function name and
argument count is compared too.
A new check for filtering not global-C-functions was added.
Test was updated.

Event Timeline

balazske created this revision.Sep 18 2019, 6:47 AM
Szelethus retitled this revision from [clang][checkers] Using CallDescription in StreamChecker. to [clang][analyzer] Using CallDescription in StreamChecker..Sep 18 2019, 9:55 AM
Szelethus added reviewers: Szelethus, NoQ.
NoQ added a comment.Sep 18 2019, 2:45 PM

Thanks! Woohoo, tests!

You can go one step further to have a CallDescriptionMap, like in D62557. This would replace the whole chain of ifs with a map lookup (which is currently still a chain of ifs under the hood, but a lot less code anyway and we'll probably optimize it under the hood later).

balazske updated this revision to Diff 222138.Sep 27 2019, 5:16 AM

Using CallDescriptionMap.

It seems like this patch is diffed against your latest commit, not the master branch.

NoQ added a comment.Sep 27 2019, 6:18 PM

It seems like this patch is diffed against your latest commit, not the master branch.

Yeah, seems so. The code looks great tho, thanks!

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
64

Feel free to omit the constructor entirely. We almost never have constructors in our checkers.

74–87

Are you sure these functions are actually sometimes implemented as builtins? I think they're required to be regular functions.

90–106

For most purposes it's more convenient to pass CallEvent around. The only moment when you actually need the CallExpr is when you're doing the binding for the return value, which happens once, so it's fine to call .getOriginExpr() directly at this point.

125–127

(not your fault but before i forget) This check is actually redundant.

273

(not your fault but before i forget) This condition is actually always true here.

balazske marked an inline comment as done.Sep 30 2019, 3:58 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
90–106

The CallExpr is used at many places to get arguments and other data. There is a CallEvent::getArgSVal that can be used instead but at some places CallExpr is used in other ways. I do not see much benefit of change the passed CallExpr to CallEvent.

NoQ added inline comments.Sep 30 2019, 12:49 PM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
90–106

at some places CallExpr is used in other ways

Can we add more convenient getters to CallEvent to help with those? 'Cause CallEvent has strictly more information than CallExpr (combining syntactic information with path-sensitive information), it's all about convenience.

I also see that CallExpr is stored in StreamState (which is something you can't do with a CallEvent because it's short-lived) but i suspect that it's actually never read from there and i doubt that it was the right solution anyway. I.e., no other checker needs that and i don't have a reason to believe that StreamChecker is special.

balazske updated this revision to Diff 222566.Oct 1 2019, 2:25 AM
  • Various code cleanups. Eval functions use CallEvent, CallExpr is removed from state.

A few nits inline, otherwise the patch is awesome, thank you!!

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
113–114

Prefer using. When I wrote D68165, I spent about 10 minutes figuring out how to do it... Ah, the joys of the C++ syntax.

using FnCheck = void (StreamChecker::*)(const CallEvent &,
                                        CheckerContext &) const;
122–123

Isn't this redundant with my other inline about parameter types?

144–148

I'm not sure why we need this, is it true that *all* stream related functions return a pointer or a numerical value? Are we actually checking whether this really is a library function? If so, this looks pretty arbitrary.

176

Why the need for (void)? CheckNullSteam doesn't seem to have an LLVM_NODISCARD attribute.

balazske marked 3 inline comments as done.Oct 1 2019, 6:29 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
122–123

Probably change to isInSystemHeader or use both?

144–148

This comes from code of CStringChecker:

// Pro-actively check that argument types are safe to do arithmetic upon.
// We do not want to crash if someone accidentally passes a structure
// into, say, a C++ overload of any of these functions. We could not check
// that for std::copy because they may have arguments of other types.

Still I am not sure that the checker works correct with code that contains similar named but "arbitrary" functions.

176

I wanted to be sure to get no buildbot compile errors (where -Werror is used).

Szelethus added inline comments.Oct 1 2019, 6:41 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
144–148

Oops, meant to write that ", is it true that *all* stream related functions have only pointer or a numerical parameters?".

Szelethus added inline comments.Oct 1 2019, 6:57 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
122–123

Actually, this looks fine. How about preserving this...

144–148

...and removing this one, or changing it to an assert?

176

They actually break on this?? Let me guess, is it the windows one? :D

balazske marked 2 inline comments as done.Oct 1 2019, 7:59 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
122–123

Should be the "stream functions" always in system header? If no isInSystemHeader call is used any global function called "fopen" with 2 arguments is recognized as stream opening function. If it returns a pointer and no "fclose" is called on that we will get a resource leak warning for that code. (But for this case the user will probably not enable the checker?).

144–148

This can not be an assert because it is possible to have global functions with same name but different signature. The check can be kept to filter out such cases. The wanted stream functions have only integral or enum or pointer arguments.

NoQ added inline comments.Oct 1 2019, 12:45 PM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
113–114

It's actually very easy to remember, it's just an alien kiss smiley ::*)

144–148

Clang shouldn't crash even when the library definition is incorrect.

176

I'd rather not discard the return value, but allow callbacks indicate an error when something goes wrong, so that to allow them to abort evalCall().

But more importantly, note how CheckNullStream actually returns a ProgramStateRef that was meant to be transitioned into. Which means that the primary execution path actually gets cut off.

So even if buildbots didn't warn on this, i wish they did.

balazske updated this revision to Diff 222795.Oct 2 2019, 3:40 AM
balazske marked an inline comment as done.
  • Re-design of eval functions.
  • Added a C++ test with fopen-looking function.
Szelethus added inline comments.Oct 11 2019, 8:04 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
268–270

Aha, so we're no longer checking whether Sym is null, because why would we, we literally made created it as a symbol a couple lines up. Is it worth assert()ing though?

285–286

Why is there a need to use an Optional? Why not just return a nullptr? As I saw it, each time we check both whether the optional has a value and whether the state within it is null.

310–311

And here too?

balazske marked 3 inline comments as done.Oct 11 2019, 8:49 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
268–270

Probably better to have the assert.

285–286

The idea was that there are 3 kind of return values: Change to a new state, do not change to new state, or error was found. In the first and last case we should return true from evalCall but not if there is no state to change into and the analysis can proceed in default way.

Charusso requested changes to this revision.Oct 29 2019, 4:29 AM
Charusso added a reviewer: Charusso.

Could you please mark resolved issues as resolved? I would like to see what we miss, and what has been done.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
114

I prefer std::function, because it is modern.

using StreamCheck = 
    std::function<void(const StreamChecker *, const CallEvent &,
                       CheckerContext &)>;

I think it is fine with pointers, but at some point we need to modernize this.

117

Because evalFopen() is basically the OpenFileAux(Call, C);, I think we could simplify the API by removing unnecessary one-liners so here you could write {{"fopen"}, &StreamChecker::OpenFileAux, that is why the CallEvent parameter is so generic and useful.

132–133

Why do you inject this?

140

I would move this tiny identifyCall() into evalCall() as the purpose of evalCall() is the validation of the Callback in this case.

152

This Optional is not necessary. When the state is changed, you can rely on CheckerContext::isDifferent() to see whether the modeling was succeeded. Therefore you could revert the bool return values as well.

This revision now requires changes to proceed.Oct 29 2019, 4:29 AM
NoQ added inline comments.Oct 29 2019, 4:17 PM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
199

This is getting pretty messy, given that you've already made a transition in this function. If you do addTransition and then generateNonFatalErrorNode, then you'll get two parallel successor nodes, but you only need one. You should either delay the transition that happens immediately after CheckNullStream, or chain the two transitions together sequentially (as opposed to in parallel).

balazske updated this revision to Diff 227093.Oct 30 2019, 7:34 AM
  • Redesign again.

Its becoming a bit difficult to navigate inlines, could you please mark them as done as you address them?

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
114

But its also a lot more expensive. https://blog.demofox.org/2015/02/25/avoiding-the-performance-hazzards-of-stdfunction/

std::function is able to wrap lambdas with different captures and all sorts of things like that, which comes at a cost.

balazske marked 22 inline comments as done and an inline comment as not done.Oct 30 2019, 8:22 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
90–106

In the new code CallEvent is passed.

114

Now std::function and std::bind is used. Probably more expensive but it is called once in a evalCall, that should be no problem?

140

identifyCall is removed now.

152

Optional is not used now. The functions only return if the passed state was modified (if applicable). To detect if error was generated the isDifferent is used.

199

evalFseek has now a transition to new (non-null stream) state or transition to (non-fatal) error.

NoQ accepted this revision.Oct 30 2019, 3:58 PM

Let's land this patch, given that it grew way too big for a refactoring pass, and let @balazske further clean things up later in his follow-up patches. This is already a large improvement and i'm very grateful!

For the Optional vs. bool debate: let's see. For now we have roughly the following:

bool checkNullStream(..., ProgramStateRef *State) {
  if (!NonBuggyState && BuggyState) {
    C.emitReport(..., C.generateErrorNode(BuggyState));
    return false;
  }

  if (NonBuggyState) {
    *State = NonBuggyState;
    return true;
  }

  return false;
}

void checkFseekWhence(..., ProgramStateRef *State) const {
  if (!isBuggy(State))
    return;

  C.emitReport(..., C.generateNonFatalErrorNode(State));
}

void evalFseek(...) {
  ProgramStateRef State = C.getState();
  bool StateChanged = checkNullStream(..., &State);
  if (C.isDifferent())
    return;

  checkFseekWhence(..., &State);
  if (!C.isDifferent() && StateChanged)
    C.addTransition(State);
}

In this code functions checkNullStream and checkFseekWhence are very similar. Both of them do some checking and potentially emit a report. In high-level terms, all you're trying to do is conduct two checks at the same program point.

How did these two functions end up being so completely different? What will you do if you have not 2 but, say, 10 different checks that you need to conduct? That's not a random question; checkers do tend to grow this way, so i'm pretty sure you'll eventually need to answer this question.

In ExprEngine, the class that's responsible for modeling all statements that have effect on the program state, we have a very common idiom for chaining pieces of work together. It looks roughly like this:

void modelSomethingPart1(ExplodedNode *Pred, set<ExplodedNode> *Succs);
void modelSomethingPart2(ExplodedNode *Pred, set<ExplodedNode> *Succs);
void modelSomethingPart3(ExplodedNode *Pred, set<ExplodedNode> *Succs);
...
void modelSomethingPartK(ExplodedNode *Pred, set<ExplodedNode> *Succs);


void modelSomething(ExplodedNode *Pred) {
  set<ExplodedNode> Set0 = { Pred };

  set<ExplodedNode> Set1;
  for (P : Set0)
    modelSomethingPart1(P, &Set1);
  
  set<ExplodedNode> Set2;
  for (P : Set1)
    modelSomethingPart2(P, &Set2);
  
  set<ExplodedNode> Set3;
  for (P : Set2)
    modelSomethingPart3(P, &Set3);

  ...

  set<ExplodedNode> SetK;
  for (P : SetKMinus1)
    modelSomethingPartK(P, &SetK);

  for (N : SetK)
    addTransition(N);
}

As you see, it scales nicely for any number of sub-tasks and you don't need to juggle boolean flags or optionals when you add another sub-task. Moreover, you can further split each of your sub-tasks into even smaller sub-sub-tasks in a similar manner.

In ExprEngine this idiom is completely essential due to how arbitrary most of the effects-of-statements tend to be. In checkers, however, this level of complexity has never been necessary so far because most sub-methods either produce exactly one new state, or sink completely. For checkers like this one, which do multiple checks, i recommend passing a single node around:

ExplodedNode *checkNullStream(..., ExplodedNode *Pred) {
  if (!NonBuggyState && BuggyState)
    C.emitReport(..., C.generateErrorNode(BuggyState, Pred));

  return C.addTransition(NonBuggyState, Pred);
}

ExplodedNode *checkFseekWhence(..., ExplodedNode *Pred) const {
  State = Pred->getState();
  if (!isBuggy(State))
    return Pred;
  ExplodedNode *N = C.generateNonFatalErrorNode(State);
  if (N)
    C.emitReport(..., N);
  return N;
}

void evalFseek(...) {
  ExplodedNode *N = C.getPredecessor();

  N = checkNullStream(..., N);
  if (!N)
    return;

  N = checkFseekWhence(..., N);
  if (!N)
    return;

  N = checkSomethingElse1(..., N);
  if (!N)
    return;

  N = checkSomethingElse2(..., N);
  if (!N)
    return;

  ...

  checkSomethingElseK(..., N);
}

This wastes exploded nodes a little bit but that's fine. It's much less scary than accidentally adding unnecessary state splits.

Charusso accepted this revision.Oct 30 2019, 4:09 PM
Charusso marked an inline comment as done.

My concern was the too heavy Optional and bool usage. Cool patch!

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
114

There is no real overhead, yes.

This revision is now accepted and ready to land.Oct 30 2019, 4:09 PM
Szelethus accepted this revision.Oct 30 2019, 6:36 PM

Let's make it 3! Thank you so much for sticking by, I guess one of the reasons why a patch so obviously great and desired took so long is that we still didn't figure out how we imagine the CallDescriptionMap to be integrated into larger checkers :) In any case, this is a major step in the right direction, MallocChecker and some of the others should take notes.

This revision was automatically updated to reflect the committed changes.
balazske marked 3 inline comments as done.