This is an archive of the discontinued LLVM Phabricator instance.

[NFC][analyzer] Allow CallDescriptions to be matched with CallExprs
ClosedPublic

Authored by Szelethus on Feb 4 2022, 8:23 AM.

Details

Summary

As of now, we have two interfaces to for defining signatures: StdLibraryFunctionsChecker::Signature and CallDescription. An example for how Signatures are used can be seen here:

addToFunctionSummaryMap(
    "isprint", Signature(ArgTypes{IntTy}, RetType{IntTy}),
    Summary(EvalCallAsPure)
        .Case({ArgumentCondition(0U, WithinRange, Range(32, 126)),
               ReturnValueCondition(OutOfRange, SingleValue(0))})
        .Case({ArgumentCondition(0U, OutOfRange, Range(32, 126)),
               ReturnValueCondition(WithinRange, SingleValue(0))}));

The name of the function is searched for in translation unit's identifier table, then the Signature is matched against each decl Decl with the same name. Ideally, this yields a FunctionDecl, which is mapped to its Summary.

This works well for C functions, but doesn't support C++ at all.

CallDescription emerged with a strong emphasis on recognizing C++ functions. The common example brough up is std::string, which in some standard library implementations is actually called something like std::__cxx11::basic_string, but not in others. Matching this can be a nightmare for checker developers. For this reason, CallDescriptions can be defined like this:


InnerPointerChecker()
    : AppendFn({"std", "basic_string", "append"}),
      AssignFn({"std", "basic_string", "assign"}),
      AddressofFn({"std", "addressof"}),
      ClearFn({"std", "basic_string", "clear"}),
      CStrFn({"std", "basic_string", "c_str"}), DataFn({"std", "data"}, 1),
      DataMemberFn({"std", "basic_string", "data"}),
      EraseFn({"std", "basic_string", "erase"}),
      InsertFn({"std", "basic_string", "insert"}),
      PopBackFn({"std", "basic_string", "pop_back"}),
      PushBackFn({"std", "basic_string", "push_back"}),
      ReplaceFn({"std", "basic_string", "replace"}),
      ReserveFn({"std", "basic_string", "reserve"}),
      ResizeFn({"std", "basic_string", "resize"}),
      ShrinkToFitFn({"std", "basic_string", "shrink_to_fit"}),
      SwapFn({"std", "basic_string", "swap"}) {}

Any identifier which matches at least these identifiers are considered a match (which sometimes leads to incorrect matching, e.g. D81745).

CallDescriptions are (usually) not used for digging up FunctionDecls from the translation unit, but rather during symbolic execution to check in a pre/post call event whether the called function matches the CallDescription:

bool InnerPointerChecker::isInnerPointerAccessFunction(
    const CallEvent &Call) const {
  return matchesAny(Call, CStrFn, DataFn, DataMemberFn);
}

Most of the new checkers implementing pre/post condition checks on functions now use CallDescriptionMap or CallDescriptionSet. Its up to debate whether the newer Signature approach is better, but its not obvious, and converting from one to the other may be non-trivial as well.


Now, onto this patch. Since CallDescriptions can only be matched against CallEvents that are created during symbolic execution, it was not possible to use it in syntactic-only contexts. For example, even though InnerPointerChecker can check with its set of CallDescriptions whether a function call is interested during analysis, its unable to check without hassle whether a non-analyzer piece of code also calls such a function.

The patch adds the ability to use CallDescriptions in syntactic contexts as well. While we already have that in Signature, we still want to leverage the ability to use dynamic information when we have it (function pointers, for example). This could be done with Signature as well (StdLibraryFunctionsChecker does it), but it makes it even less of a drop-in replacement.

Diff Detail

Event Timeline

Szelethus created this revision.Feb 4 2022, 8:23 AM
Szelethus requested review of this revision.Feb 4 2022, 8:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 4 2022, 8:23 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

I strongly belive that this should be an overload to the existing 'matches' API. Maybe add a comment that prefer the other overload if can. But having an overload for that alread implies this anyway. That being said, digging out a callexpr from a CallEvent and calling the callexpr overload seems to be too artifical to me to worry about.

I strongly belive that this should be an overload to the existing 'matches' API. Maybe add a comment that prefer the other overload if can. But having an overload for that alread implies this anyway.

I somewhat disagree. CallDescription is one of the interfaces that newcomers come by rather fast, and a descriptive name would be a nice piece of guidance. I am not sure what can be gained by turning this to an overload.

That being said, digging out a callexpr from a CallEvent and calling the callexpr overload seems to be too artifical to me to worry about.

Well, a tiny bit more is happening than that with the argument count -- what do you mean with this statement exactly?

Now that I remember, the ever so slightly different overloads of ProgramState::getSVal is a prime example I think. I always percieved that I have the means to invoke several of them at any point, but I never really knew which one. Though, to be fair, they were not documented particularly well (at least as I remember it).

NoQ added a comment.Feb 6 2022, 10:06 PM

The original lookup() isn't exactly precise either, it's just slightly more precise as it has better access to path-sensitive information such as current values of function pointers, but this doesn't necessarily help given that these pointers can still be unknown. And when the information is not available the lookup silently fails in both cases.

But I can certainly get behind demotivating callers from calling the new function unless they know what they're doing. Maybe lookupAsWritten() to indicate that the function intentionally ignores the runtime state of the program and looks at the syntax only?

The original lookup() isn't exactly precise either, it's just slightly more precise as it has better access to path-sensitive information such as current values of function pointers, but this doesn't necessarily help given that these pointers can still be unknown. And when the information is not available the lookup silently fails in both cases.

But I can certainly get behind demotivating callers from calling the new function unless they know what they're doing. Maybe lookupAsWritten() to indicate that the function intentionally ignores the runtime state of the program and looks at the syntax only?

I still don't see the benefit of introducing another API.
This is actually a difference between the CallEvent and a CallExpr, thus it has not much to do with the CallDescription. For choosing the right matches() overload inherently depends on knowing about the parameter, on which we overload on.
We should have this as a comment for the type CallEvent, and I'm okay with reminding users even at the matches(CallExpr) overload.
I'm still heavily against introducing a new API instead of an overload.

NoQ added a comment.Feb 7 2022, 10:10 AM

It's definitely a bug-prone scenario. I can totally see people stuffing whatever happens to be more readily available in the code into the API without thinking too much about the pros and cons. The difference between CallExpr and CallEvent is large in general but with respect to this API it's very subtle. I can easily imagine people missing it even when they know the difference between CallExpr and CallEvent in general. So I think it's worth attracting attention to.

Also technically a new overload *is* a new API. It just happens to have the same name as the old one.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
102–103
203–204

In doxygen these comments will be below :)

I think this should be a full comment so that people didn't have to click.

steakhal resigned from this revision.Feb 8 2022, 7:10 AM

I think I don't have much to add. I still haven't changed my mind, but let's go with what the majority of people want.
To make the whole stack consistent, consider mocking the variadic free function matchesAny() as well for CallExprs.

Szelethus updated this revision to Diff 407106.Feb 9 2022, 3:32 AM
  • Rename from .*Imprecise to .*AsWritten.
  • Copy comments to relevant functions.
steakhal added inline comments.Feb 9 2022, 4:36 AM
clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
102–103

ping

111

I think it's a free function. I know that copydoc did not work for this example.
Are you sure adding the ::CallDescription fixes the doc comment?

clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
50

.

554

You could have define ResultMap ad a virtual base class, which would be implemented by two different classes. One of which would use the asWritten lookups, etc.
You could make_unique of the required one and inject it to the Action.

That way those tests would look just the same as the previous ones.

Szelethus updated this revision to Diff 410833.Feb 23 2022, 8:29 AM
Szelethus marked 5 inline comments as done.

Remove a newline.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
102–103

This is actually the CallEvent variant, I corrected the CallExpr one :)

111

Yes! Its still in CallDescriptions "namespace", as its defined in-class. I ran doxygen-clang and can confirm that this works, but only without line breaks (or at least I haven't figured out how to do it without it, not even with backslash)

clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
554

My rationale was that reusing an already existing machinery which is already used in many of the static analyzer unit tests is far friendlier for beginners. I think fracturing the file specific stuff here would increase the barrier of entry for very little gain.

One of the things I like a lot on unit tests is that they demonstrate on a small scale a part of the analyzer's core machinery. I think dividing this up would go against that as well.

I think it deserves an accept, however, as I don't agree with the rationale I'll let someone else for doing this.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h
111

Ah, thanks!

clang/unittests/StaticAnalyzer/CallDescriptionTest.cpp
554

If you really think this way, you should port the existing tests of this file to the new format to get rid of the already existing "machinery".
I'm okay either way, but not with mixing the two approaches.

NoQ accepted this revision.Feb 23 2022, 4:36 PM

Looks great, thanks!

This revision is now accepted and ready to land.Feb 23 2022, 4:36 PM
This revision was landed with ongoing or failed builds.Mar 1 2022, 8:13 AM
This revision was automatically updated to reflect the committed changes.