This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] StdLibraryFunctionsChecker: Add summaries for libc
Needs ReviewPublic

Authored by martong on May 5 2020, 10:14 AM.

Details

Summary

Add a few functions for libc. The code in this patch that adds the summaries is
auto generated from Cppcheck's config directory
(https://github.com/danmar/cppcheck/blob/master/cfg/std.cfg). Note that not all
functions are lifted in this patch. Probably another patch will follow with more
summaries for libc.

Diff Detail

Event Timeline

martong created this revision.May 5 2020, 10:14 AM

There a couple functional lines commented out, and we need more tests, so I take that the patch is a proof of concept for now? Because its pretty cool if so. :)

I think testing summaries this way can be really hard to manage in the future.
I see two possible ways forward to make this easier:
a) Make something like https://reviews.llvm.org/D78118 that will also dump the actual summary in a textual form, not only the fact that a summary was loaded and check for that textual form.
b) Make a small helper library for testing summaries. E.g., most of the buffer handling functions have similar summaries, so we could have only a couple of functions for testing buffer related summaries and we could just pass in function pointers (as templates or params) instead of duplicating code. Similarly, testing output ranges could be generalized.

I think b) might be a bit better solution.
Although I do see why some people like to keep tests simple without additional layers of abstractions, so I do not insist :)

I think testing summaries this way can be really hard to manage in the future.
I see two possible ways forward to make this easier:
a) Make something like https://reviews.llvm.org/D78118 that will also dump the actual summary in a textual form, not only the fact that a summary was loaded and check for that textual form.
b) Make a small helper library for testing summaries. E.g., most of the buffer handling functions have similar summaries, so we could have only a couple of functions for testing buffer related summaries and we could just pass in function pointers (as templates or params) instead of duplicating code. Similarly, testing output ranges could be generalized.

I think b) might be a bit better solution.
Although I do see why some people like to keep tests simple without additional layers of abstractions, so I do not insist :)

My idea about the testing is the following:
We test each individual argument constraint by itself with a test function. E.g. in case of buffer sizes we have a test function specifically for it (and it requires a debug checker to be enabled)

addToFunctionSummaryMap(
    "__buf_size_arg_constraint",
    Summary(ArgTypes{ConstVoidPtrTy, SizeTy}, RetType{IntTy},
            EvalCallAsPure)
        .ArgConstraint(
            BufferSize(/*Buffer=*/ArgNo(0), /*BufSize=*/ArgNo(1))));

And then in the test file we have the exact expectations:

int __buf_size_arg_constraint(const void *, size_t);
void test_buf_size_concrete() {
  char buf[3];                       // bugpath-note{{'buf' initialized here}}
  __buf_size_arg_constraint(buf, 4); // \
  // report-warning{{Function argument constraint is not satisfied}} \
  // bugpath-warning{{Function argument constraint is not satisfied}} \
  // bugpath-note{{Function argument constraint is not satisfied}}
}
...

This way we have a test coverage for each argument constraint. The summaries are composed from these argument constraints. There may be issues with the composition, so we do have some additional test cases for that, e.g we have a test to check the expectations when one function has more than one constraints:

int __two_constrained_args(int, int);
void test_constraints_on_multiple_args(int x, int y) {
  // State split should not happen here. I.e. x == 1 should not be evaluated
  // FALSE.
  __two_constrained_args(x, y);
  clang_analyzer_eval(x == 1); // \
  // report-warning{{TRUE}} \
  // bugpath-warning{{TRUE}} \
  // bugpath-note{{TRUE}}
  clang_analyzer_eval(y == 1); // \
  // report-warning{{TRUE}} \
  // bugpath-warning{{TRUE}} \
  // bugpath-note{{TRUE}}
}

So, we have tests for each individual building blocks and for their conjunctions, which renders the testing of all concrete summaries superfluous and bloated. It would be as if a compiler was tested for all possible inputs.
On top of all this, errors (assertions) with concrete summaries should be discovered by running the analysis on real world projects like tmux, bitcoin, etc.

NoQ added inline comments.Jun 23 2020, 2:30 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1123–1124

The three-way state split is unjustified here. Usage of abs is not a sufficient indication that the value may be 0, otherwise:

int foo(int x, int y) {
  int z = abs(y);   // Assuming 'abs' has taken branch on which y == 0...
  return x / z;     // ... we'll be forced to emit a division by zero warning here.
}

Generally, there are very few cases when state splits upon function calls are justified. The common cases are:

  • The function returns bool and finding that bool is the only reason to ever call this function. Eg., isalpha() and such.
  • The function can at any time completely unpredictably take any of the branches, in other words, taint is involved. Eg., scanf() can always fail simply because the user of the program wrote something special into stdin.
NoQ added inline comments.Jun 23 2020, 2:33 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1123–1124

returns bool

Or something that kinda resembles bool (eg., isalpha() returns a variety of different ints in practice due to its static lookup table implementation strategy but the user only cares about whether it's zero or non-zero).

NoQ added inline comments.Jun 23 2020, 3:22 AM
clang/test/Analysis/std-c-library-functions.c
231

Emm :)

martong marked 2 inline comments as done.Jun 23 2020, 7:30 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1123–1124

Alright, I agree, I'll remove these cases.

Generally speaking I realized that it is hard to create these cases, and it is very rare when we can come up with a meaningful case set for any function. So while we are at it, could you please take a look at another patch, where I add no cases just argument constraints: D82288 ?

clang/test/Analysis/std-c-library-functions.c
231

Ups, ouch :D

balazske added inline comments.Sep 15 2020, 9:27 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1123–1124

But it could make sense that if the input of the function is known before the call, we can compute the output? For abs(x) if we know that x == 0 is true the checker could assume abs(x) == 0 too. These "cases" provide data for this functionality.

martong added inline comments.Sep 17 2020, 6:37 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
1123–1124

For abs(x) if we know that x == 0 is true the checker could assume abs(x) == 0 too.

Yes, this is absolutely true.

I think, we could approach this with introducing a postcall Kind for the summaries. Currently, we already have EvalCallAsPure and NoEvalCall for evalCall. We could have let's say

enum PostCallKind { UnconditionedStateSplit, ConstrainReturnIfArgConstraintsAreSatisfied }

For the ConstrainReturnIfArgConstraintsAreSatisfied we should check if the preconditions for the Args in the Case are all evaluated to Known and if so only would we apply the constraint on the return value.
@NoQ What do you think?
@steakhal FYI.