Add functions for POSIX. Contrary to libc, Posix functions do not contain any
nonull attributes, thus providing nonull information through this checker adds
high value. The code in this patch is auto generated from Cppcheck's config
directory (https://github.com/danmar/cppcheck/blob/master/cfg/posix.cfg). The
list is as complete as it is in Cppcheck. I measered the effect of these newly
added summaries on real projects: tmux, redis, curl. The number of bugs has
increased by 3-6%. During these tests it turned out that some summaries were
not correct in the original config file for Cppcheck, those errors have been
corrected in this patch.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This is cool!
Some questions:
- Have you reported those bugs to CppCheck devs? It might be useful for us as well, as they can also double-check who is right.
- This is a really large number of summaries. At this point, I wonder if it would be worth have a separate checker for them (that is utilizing the same infrastructure).
- Is it worth to have the script you used to do the conversion available for other people? Are there other summaries we might be interested in, like Qt or other popular libraries?
The checker or at least the source file should be split somehow, adding of summaries should be separated from the other parts, otherwise the file will be too long.
- Fix file descriptor params to have proper range value
- usleep 1000000 -> 999999
- Fix fstat family
Hi Gabor, thanks for the reviews! :)
Some questions:
- Have you reported those bugs to CppCheck devs? It might be useful for us as well, as they can also double-check who is right.
Yes, I just have done that:
https://github.com/danmar/cppcheck/pull/2628
https://github.com/danmar/cppcheck/pull/2629
https://github.com/danmar/cppcheck/pull/2630
Actually by reporting these bugs I realized that not all have been fixed completely here in this patch, see the latest update.
- This is a really large number of summaries. At this point, I wonder if it would be worth have a separate checker for them (that is utilizing the same infrastructure).
I was thinking about to rename this checker: StdLibraryFunctionsChecker -> LibraryFunctionsChecker. And we could add a checker option (or a subchecker) to enable these summaries for POSIX.
- Is it worth to have the script you used to do the conversion available for other people?
Well, that is just some hacky-wacky scratch of mine that I did not plan to share. But I am pasting it here in its current state, maybe that has some value for somebody in the future.
#!/usr/bin/python3 import subprocess import sys import xml.etree.ElementTree as ET from io import StringIO blacklist = [ 'strchr', 'ctime', ] whitelist = [ 'select', 'close', 'htons', 'socket', 'setsockopt', ] def decr(nr): return str(int(nr) - 1) root = ET.parse(sys.argv[1]).getroot() for function in root.findall('function'): pureness = "NoEvalCall" if function.find('pure') is not None: pureness = "EvalCallAsPure" names = function.get('name') for name in names.split(","): if name.endswith("_l"): continue # if name in blacklist: # continue # if name not in whitelist: # continue if name.startswith("std::"): continue old_stdout = sys.stdout sys.stdout = mystdout = StringIO() # Get the signature that is in a comment. Unfortunately comments are # skipped from the XML during read. sign = subprocess.run( ['/bin/grep', str(name) + '(', sys.argv[1]], stdout=subprocess.PIPE) sign = sign.stdout.decode('utf-8') sign = sign.replace( '<!--', '').replace( '-->', '').replace( '\n', '').strip() print("// " + sign) print("addToFunctionSummaryMap(\"" + str(name) + # "\",\nSummary(ArgTypes{}, RetType{}, " + pureness + ")") "\",\nSummary(" + pureness + ")") returnValue = function.find('returnValue') returnValueConstraint = None if returnValue is not None: if returnValue.text is not None: # returnValue constraint returnValueConstraint = returnValue.text if returnValueConstraint is not None: print(".Case({})".format(returnValue.text)) args = function.findall('arg') for arg in args: if arg is not None: nr = arg.get('nr') if nr is None or nr == 'any': continue nr = decr(nr) notnull = arg.find('not-null') if notnull is not None: print(".ArgConstraint(NotNull(ArgNo(" + nr + ")))") valid = arg.find('valid') if valid is not None: if valid.text.endswith(':'): l = valid.text.split(':') print( ".ArgConstraint(ArgumentCondition({}, WithinRange, Range({}, Max)))".format( nr, l[0])) else: print( ".ArgConstraint(ArgumentCondition({}, WithinRange, Range({})))".format( nr, valid.text)) minsize = arg.find('minsize') if minsize is not None: mstype = minsize.get('type') if mstype == 'argvalue': print( ".ArgConstraint(BufferSize({},{}))".format( nr, decr(minsize.get('arg')))) if mstype == 'mul': print( ".ArgConstraint(BufferSize({},{},{}))".format( nr, decr( minsize.get('arg')), decr( minsize.get('arg2')))) print(");") # Print only non-trivial summaries. sys.stdout = old_stdout if ".ArgConstraint" in mystdout.getvalue() or ".Case" in mystdout.getvalue(): print(mystdout.getvalue())
Are there other summaries we might be interested in, like Qt or other popular libraries?
Absolutely. In telecomm we are interested in openssl and libcurl, but the number of functions in the corresponding xmls is quite low compared to Posix.
Also, Posix has a pretty stable standardized interface that is not really a subject of change. However, other libs may change more frequently, so I don't think we could support them similarly to Posix.
Furthermore, Posix and Libc are essential libs used almost by all C/C++ programs in some form. The other libs are not that common, so I'd rather not add extra compilation time of this checker for supporting them.
So, I think, once we are capable of handling our own format for summaries (with our own Yaml format or by extending API-notes) then should we port those libs.
Yes, I've been thinking about to have a separate PosixFunctions.cpp to populate the posix function summaries.
With @xazax.hun we agreed that we should try our best to provide signatures for all functions. This includes two major changes:
- Add signatures to each summary here, possibly with Irrelevant types.
- Add the ability to lookup types. This will result that we can change the majority Irrelevant types to concrete types.
Note, we still cannot get rid of Irrelevant types because of platform dependent typedefs and macros. E.g. ssize_t or __CONST_SOCKADDR_ARG. Check out glibc 2.27 implementation for struct sockaddr * below. The only solution for such cases is to use an Irrelevant type.
#if defined __cplusplus || !__GNUC_PREREQ (2, 7) || !defined __USE_GNU # define __SOCKADDR_ARG struct sockaddr *__restrict # define __CONST_SOCKADDR_ARG const struct sockaddr * #else /* Add more `struct sockaddr_AF' types here as necessary. These are all the ones I found on NetBSD and Linux. */ # define __SOCKADDR_ALLTYPES \ __SOCKADDR_ONETYPE (sockaddr) \ __SOCKADDR_ONETYPE (sockaddr_at) \ __SOCKADDR_ONETYPE (sockaddr_ax25) \ __SOCKADDR_ONETYPE (sockaddr_dl) \ __SOCKADDR_ONETYPE (sockaddr_eon) \ __SOCKADDR_ONETYPE (sockaddr_in) \ __SOCKADDR_ONETYPE (sockaddr_in6) \ __SOCKADDR_ONETYPE (sockaddr_inarp) \ __SOCKADDR_ONETYPE (sockaddr_ipx) \ __SOCKADDR_ONETYPE (sockaddr_iso) \ __SOCKADDR_ONETYPE (sockaddr_ns) \ __SOCKADDR_ONETYPE (sockaddr_un) \ __SOCKADDR_ONETYPE (sockaddr_x25) # define __SOCKADDR_ONETYPE(type) struct type *__restrict __##type##__; typedef union { __SOCKADDR_ALLTYPES } __SOCKADDR_ARG __attribute__ ((__transparent_union__)); # undef __SOCKADDR_ONETYPE # define __SOCKADDR_ONETYPE(type) const struct type *__restrict __##type##__; typedef union { __SOCKADDR_ALLTYPES } __CONST_SOCKADDR_ARG __attribute__ ((__transparent_union__)); # undef __SOCKADDR_ONETYPE #endif
This patch is no longer needed because I am going to upstream the POSIX functions in groups, otherwise the patch would be hideously large.
The groups are: file handling, networking, pthread_, mq_, dbm_, regex related, sched_, time, signal, terminal related, ...