This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] StdLibraryFunctionsChecker: Add summaries for POSIX
AbandonedPublic

Authored by martong on May 5 2020, 10:31 AM.

Details

Summary

Add functions for POSIX. Contrary to libc, Posix functions do not contain any
nonull attributes, thus providing nonull information through this checker adds
high value. The code in this patch is auto generated from Cppcheck's config
directory (https://github.com/danmar/cppcheck/blob/master/cfg/posix.cfg). The
list is as complete as it is in Cppcheck. I measered the effect of these newly
added summaries on real projects: tmux, redis, curl. The number of bugs has
increased by 3-6%. During these tests it turned out that some summaries were
not correct in the original config file for Cppcheck, those errors have been
corrected in this patch.

Diff Detail

Event Timeline

martong created this revision.May 5 2020, 10:31 AM
martong edited the summary of this revision. (Show Details)May 6 2020, 1:48 AM

This is cool!

Some questions:

  1. Have you reported those bugs to CppCheck devs? It might be useful for us as well, as they can also double-check who is right.
  2. This is a really large number of summaries. At this point, I wonder if it would be worth have a separate checker for them (that is utilizing the same infrastructure).
  3. Is it worth to have the script you used to do the conversion available for other people? Are there other summaries we might be interested in, like Qt or other popular libraries?

The checker or at least the source file should be split somehow, adding of summaries should be separated from the other parts, otherwise the file will be too long.

martong updated this revision to Diff 262366.May 6 2020, 6:50 AM
  • Fix file descriptor params to have proper range value
  • usleep 1000000 -> 999999
  • Fix fstat family

This is cool!

Hi Gabor, thanks for the reviews! :)

Some questions:

  1. Have you reported those bugs to CppCheck devs? It might be useful for us as well, as they can also double-check who is right.

Yes, I just have done that:
https://github.com/danmar/cppcheck/pull/2628
https://github.com/danmar/cppcheck/pull/2629
https://github.com/danmar/cppcheck/pull/2630
Actually by reporting these bugs I realized that not all have been fixed completely here in this patch, see the latest update.

  1. This is a really large number of summaries. At this point, I wonder if it would be worth have a separate checker for them (that is utilizing the same infrastructure).

I was thinking about to rename this checker: StdLibraryFunctionsChecker -> LibraryFunctionsChecker. And we could add a checker option (or a subchecker) to enable these summaries for POSIX.

  1. Is it worth to have the script you used to do the conversion available for other people?

Well, that is just some hacky-wacky scratch of mine that I did not plan to share. But I am pasting it here in its current state, maybe that has some value for somebody in the future.

#!/usr/bin/python3
import subprocess
import sys
import xml.etree.ElementTree as ET
from io import StringIO


blacklist = [
    'strchr',
    'ctime',
]

whitelist = [
'select',
'close',
'htons',
'socket',
'setsockopt',
]


def decr(nr):
    return str(int(nr) - 1)


root = ET.parse(sys.argv[1]).getroot()
for function in root.findall('function'):

    pureness = "NoEvalCall"
    if function.find('pure') is not None:
        pureness = "EvalCallAsPure"

    names = function.get('name')
    for name in names.split(","):

        if name.endswith("_l"):
            continue

        # if name in blacklist:
            # continue

        # if name not in whitelist:
            # continue

        if name.startswith("std::"):
            continue

        old_stdout = sys.stdout
        sys.stdout = mystdout = StringIO()

        # Get the signature that is in a comment. Unfortunately comments are
        # skipped from the XML during read.
        sign = subprocess.run(
            ['/bin/grep', str(name) + '(', sys.argv[1]],
            stdout=subprocess.PIPE)
        sign = sign.stdout.decode('utf-8')
        sign = sign.replace(
            '<!--',
            '').replace(
            '-->',
            '').replace(
            '\n',
            '').strip()
        print("// " + sign)

        print("addToFunctionSummaryMap(\"" + str(name) +
              # "\",\nSummary(ArgTypes{}, RetType{}, " + pureness + ")")
              "\",\nSummary(" + pureness + ")")

        returnValue = function.find('returnValue')
        returnValueConstraint = None
        if returnValue is not None:
            if returnValue.text is not None:  # returnValue constraint
                returnValueConstraint = returnValue.text
        if returnValueConstraint is not None:
            print(".Case({})".format(returnValue.text))

        args = function.findall('arg')
        for arg in args:
            if arg is not None:
                nr = arg.get('nr')
                if nr is None or nr == 'any':
                    continue
                nr = decr(nr)

                notnull = arg.find('not-null')
                if notnull is not None:
                    print(".ArgConstraint(NotNull(ArgNo(" + nr + ")))")

                valid = arg.find('valid')
                if valid is not None:
                    if valid.text.endswith(':'):
                        l = valid.text.split(':')
                        print(
                            ".ArgConstraint(ArgumentCondition({}, WithinRange, Range({}, Max)))".format(
                                nr, l[0]))
                    else:
                        print(
                            ".ArgConstraint(ArgumentCondition({}, WithinRange, Range({})))".format(
                                nr, valid.text))

                minsize = arg.find('minsize')
                if minsize is not None:
                    mstype = minsize.get('type')
                    if mstype == 'argvalue':
                        print(
                            ".ArgConstraint(BufferSize({},{}))".format(
                                nr, decr(minsize.get('arg'))))
                    if mstype == 'mul':
                        print(
                            ".ArgConstraint(BufferSize({},{},{}))".format(
                                nr, decr(
                                    minsize.get('arg')), decr(
                                    minsize.get('arg2'))))

        print(");")

        # Print only non-trivial summaries.
        sys.stdout = old_stdout
        if ".ArgConstraint" in mystdout.getvalue() or ".Case" in mystdout.getvalue():
            print(mystdout.getvalue())

Are there other summaries we might be interested in, like Qt or other popular libraries?

Absolutely. In telecomm we are interested in openssl and libcurl, but the number of functions in the corresponding xmls is quite low compared to Posix.
Also, Posix has a pretty stable standardized interface that is not really a subject of change. However, other libs may change more frequently, so I don't think we could support them similarly to Posix.
Furthermore, Posix and Libc are essential libs used almost by all C/C++ programs in some form. The other libs are not that common, so I'd rather not add extra compilation time of this checker for supporting them.
So, I think, once we are capable of handling our own format for summaries (with our own Yaml format or by extending API-notes) then should we port those libs.

The checker or at least the source file should be split somehow, adding of summaries should be separated from the other parts, otherwise the file will be too long.

Yes, I've been thinking about to have a separate PosixFunctions.cpp to populate the posix function summaries.

martong planned changes to this revision.May 14 2020, 7:09 AM

With @xazax.hun we agreed that we should try our best to provide signatures for all functions. This includes two major changes:

  1. Add signatures to each summary here, possibly with Irrelevant types.
  2. Add the ability to lookup types. This will result that we can change the majority Irrelevant types to concrete types.

Note, we still cannot get rid of Irrelevant types because of platform dependent typedefs and macros. E.g. ssize_t or __CONST_SOCKADDR_ARG. Check out glibc 2.27 implementation for struct sockaddr * below. The only solution for such cases is to use an Irrelevant type.

#if defined __cplusplus || !__GNUC_PREREQ (2, 7) || !defined __USE_GNU
# define __SOCKADDR_ARG		struct sockaddr *__restrict
# define __CONST_SOCKADDR_ARG	const struct sockaddr *
#else
/* Add more `struct sockaddr_AF' types here as necessary.
   These are all the ones I found on NetBSD and Linux.  */
# define __SOCKADDR_ALLTYPES \
  __SOCKADDR_ONETYPE (sockaddr) \
  __SOCKADDR_ONETYPE (sockaddr_at) \
  __SOCKADDR_ONETYPE (sockaddr_ax25) \
  __SOCKADDR_ONETYPE (sockaddr_dl) \
  __SOCKADDR_ONETYPE (sockaddr_eon) \
  __SOCKADDR_ONETYPE (sockaddr_in) \
  __SOCKADDR_ONETYPE (sockaddr_in6) \
  __SOCKADDR_ONETYPE (sockaddr_inarp) \
  __SOCKADDR_ONETYPE (sockaddr_ipx) \
  __SOCKADDR_ONETYPE (sockaddr_iso) \
  __SOCKADDR_ONETYPE (sockaddr_ns) \
  __SOCKADDR_ONETYPE (sockaddr_un) \
  __SOCKADDR_ONETYPE (sockaddr_x25)

# define __SOCKADDR_ONETYPE(type) struct type *__restrict __##type##__;
typedef union { __SOCKADDR_ALLTYPES
	      } __SOCKADDR_ARG __attribute__ ((__transparent_union__));
# undef __SOCKADDR_ONETYPE
# define __SOCKADDR_ONETYPE(type) const struct type *__restrict __##type##__;
typedef union { __SOCKADDR_ALLTYPES
	      } __CONST_SOCKADDR_ARG __attribute__ ((__transparent_union__));
# undef __SOCKADDR_ONETYPE
#endif
martong abandoned this revision.Jun 22 2020, 2:31 AM

This patch is no longer needed because I am going to upstream the POSIX functions in groups, otherwise the patch would be hideously large.
The groups are: file handling, networking, pthread_, mq_, dbm_, regex related, sched_, time, signal, terminal related, ...