Index: docs/ClangStaticAnalyzer.rst =================================================================== --- /dev/null +++ docs/ClangStaticAnalyzer.rst @@ -0,0 +1,184 @@ +===================== +Clang Static Analyzer +===================== + +.. contents:: + :local: + +Introduction +------------ + +The Clang Static Analyzer is a source code analysis tool that finds bugs in C, C++, and Objective-C programs. + +It implements *path-sensitive*, *inter-procedural analysis* based on *symbolic execution* technique. + +Currently it can be run either as a standalone tool or within Xcode. +The standalone tool is invoked from the command line, +and is intended to be run in tandem with a build of a codebase. + +The analyzer is 100% open source and is part of the Clang project. +Like the rest of Clang, the analyzer is implemented as a C++ library +that can be used by other tools and applications. + + +Getting the Analyzer +-------------------- + +Pre-built binary packages for OSX are available `here `_. + +Alternatively you can build Clang SA from source code. + +User Manual +----------- + +Clang Static Analyzer is supported on Linux, Mac OSX and Windows platforms. + +You can analyze a simple source file by invoking *clang* directly. + +To analyze *test.c* + +.. code-block:: cpp + + // test.c: + int test(int i){ + if (i==0) + return 5/i; + else + return 0; + } + +the analyzer can be directly invoked from command line: + +.. code-block:: bash + + clang --analyze ./test.c -Xclang -analyzer-output -Xclang text + ./test.c:3:13: warning: Division by zero + return 5/i; + ~^~ + ./test.c:2:7: note: Assuming 'i' is equal to 0 + if (i==0) + ^~~~ + ./test.c:2:3: note: Taking true branch + if (i==0) + ^ + ./test.c:3:13: note: Division by zero + return 5/i; + ~^~ + 1 warning generated. + + +* `scan-build `_ utility can be used (Linux, Mac OSX, Windows) to analyze large projects from command line. This tool integrates the analyzer with your build system. +* `XCode integration `_ is also available. +* `Code Annotations `_ page describes how to annotate your source code to give extra information to the analyzer (such as a parameter cannot be null, or a function call does not return). These annotations can be used to avoid certain *false positive* reports. +* `Frequently Asked Questions `_ + + +Available Checkers +------------------ + +For a full list see :doc:`analyzer/checkers` + +Checkers are delivered in packages. The following main packages are supported: + +* :ref:`core-checkers` Models core language features and contains general-purpose checkers such as division by zero, null pointer dereference, usage of uninitialized values, etc. *These checkers must be always switched on as other checker rely on them.* +* :ref:`cplusplus-checkers` C++ Checkers. +* :ref:`deadcode-checkers` Dead Code Checkers +* :ref:`nullability-checkers` Objective C checkers that warn for null pointer passing and dereferencing errors. +* :ref:`optin-checkers` Checkers for portability, performance or coding style specific rules. +* :ref:`security-checkers` Security related checkers. +* :ref:`unix-checkers` POSIX/Unix checkers +* :ref:`osx-checkers` OSX Checkers +* :ref:`alpha ` Experimental checkers that are under development. These checkers may have a high false positive rate or crash in certain cases. +* :ref:`debug-checkers` Checkers used for debugging the analyzer. + +Writeups with examples of some of the bugs that the analyzer finds + +* `Bug Finding With Clang: 5 Resources To Get You Started `_ +* `Finding Memory Leaks With The LLVM/Clang Static Analyzer `_ +* `Under the Microscope - The Clang Static Analyzer `_ +* `Mike Ash - Using the Clang Static Analyzer `_ + +Development +----------- +Building from source code +^^^^^^^^^^^^^^^^^^^^^^^^^ +You need to follow the standard `Clang build procedure `_. +When the build is finished Clang Static Analyzer can be reached as part of the clang binary *llvm/build/bin/clang*. + + +You can find here some tips to configure your build environment for (quicker) development: + +.. code-block:: sh + + # You may want to build clang with dynamic libraries -DBUILD_SHARED_LIBS=ON + # This will decrease the size of the binaries significantly and also decrease the + # linking time. The resulting binaries, however, will be somewhat slower. + # Also note that, shared libs are not working on Windows. + # If you have the gold or lld linker installed, use that. They are faster than the + # standard ld linker. lld is even faster than gold. + # add -DLLVM_USE_LINKER=lld to use lld. + # -DCMAKE_CXX_COMPILER=clang++, -DCMAKE_C_COMPILER=clang tells cmake to + # use clang compiler instead of g++ and gcc. + # If you want to limit the number of link jobs use: -DLLVM_PARALLEL_LINK_JOBS=2 + # If you want to build tests by default use: -DLLVM_BUILD_TESTS=ON + # If the toolchain is recent enough, build times can be further improved + # by splitting the debug info into spearate files using: + # -DLLVM_USE_SPLIT_DWARF=ON + # More info on split-dwarf: http://www.productive-cpp.com/improving-cpp-builds-with-split-dwarf/ + # For development it is recommended to switch on assertions: -DLLVM_ENABLE_ASSERTIONS=ON. + # + cmake \ + -G "Ninja" \ + -DCMAKE_BUILD_TYPE=RelWithDebInfo \ + -DBUILD_SHARED_LIBS=ON \ + -DLLVM_TARGETS_TO_BUILD=X86 \ + -DLLVM_ENABLE_ASSERTIONS=ON \ + -DCMAKE_CXX_COMPILER=clang++ \ + -DCMAKE_C_COMPILER=clang \ + -DLLVM_USE_LINKER=lld \ + ../llvm + + +Checker Development +^^^^^^^^^^^^^^^^^^^ +`Checker Developer Manual `_ describes +the architecture of Clang Static Analyzer and how to start develop your own checker. + +Debugging +^^^^^^^^^ +:doc:`analyzer/DebugChecks` help to track the internal state of the analyzer, such as *control-flow graph*, *call-graph*, *exploded graph* etc. + +Design Discussions +^^^^^^^^^^^^^^^^^^ +* :doc:`analyzer/DesignDiscussions/InitializerLists` Modeling C++ standard library constructs in general and especially including modeling implementation-defined fields within C++ standard library objects. +* :doc:`analyzer/DesignDiscussions/nullability` High level description of the nullablility checks. +* :doc:`analyzer/DesignDiscussions/IPA` Description of Inter-Procedural Analysis +* :doc:`analyzer/DesignDiscussions/RegionStore` Representation of memory regions +Potential Future Checkers +^^^^^^^^^^^^^^^^^^^^^^^^^ +`Future Checkers `_ page lists potential checkers to be implemented in the Static Analyzer. + +Open Projects +^^^^^^^^^^^^^ +`Open Projects `_ +page lists several implementation ideas (tasks) that would boost analyzer's usability and power. + + +Mailing Lists +------------- +* `cfe-dev `_ mailing list is for clang users and developers. Always use the **[analyzer]** subject in your posts. +* `cfe-commits `_ is for sending patches. + + +Reporting Bugs +-------------- + +We encourage users to file bug reports for any problems that they encounter. + +File bugs in `LLVMs Bugzilla `_ database against +the **Clang Static Analyzer** component. + +We also welcome feature requests. When filing a bug report, please do the following: + +* Include the checker build (for prebuilt Mac OS X binaries) or the SVN revision number. +* Provide a self-contained, reduced test case that exhibits the issue you are experiencing. Index: docs/analyzer/DesignDiscussions/IPA.rst =================================================================== --- /dev/null +++ docs/analyzer/DesignDiscussions/IPA.rst @@ -0,0 +1,386 @@ +Inlining +======== + +There are several options that control which calls the analyzer will consider for +inlining. The major one is -analyzer-config ipa: + + -analyzer-config ipa=none - All inlining is disabled. This is the only mode + available in LLVM 3.1 and earlier and in Xcode 4.3 and earlier. + + -analyzer-config ipa=basic-inlining - Turns on inlining for C functions, C++ + static member functions, and blocks -- essentially, the calls that behave + like simple C function calls. This is essentially the mode used in + Xcode 4.4. + + -analyzer-config ipa=inlining - Turns on inlining when we can confidently find + the function/method body corresponding to the call. (C functions, static + functions, devirtualized C++ methods, Objective-C class methods, Objective-C + instance methods when ExprEngine is confident about the dynamic type of the + instance). + + -analyzer-config ipa=dynamic - Inline instance methods for which the type is + determined at runtime and we are not 100% sure that our type info is + correct. For virtual calls, inline the most plausible definition. + + -analyzer-config ipa=dynamic-bifurcate - Same as -analyzer-config ipa=dynamic, + but the path is split. We inline on one branch and do not inline on the + other. This mode does not drop the coverage in cases when the parent class + has code that is only exercised when some of its methods are overridden. + +Currently, -analyzer-config ipa=dynamic-bifurcate is the default mode. + +While -analyzer-config ipa determines in general how aggressively the analyzer +will try to inline functions, several additional options control which types of +functions can inlined, in an all-or-nothing way. These options use the +analyzer's configuration table, so they are all specified as follows: + + -analyzer-config OPTION=VALUE + +### c++-inlining ### + +This option controls which C++ member functions may be inlined. + + -analyzer-config c++-inlining=[none | methods | constructors | destructors] + +Each of these modes implies that all the previous member function kinds will be +inlined as well; it doesn't make sense to inline destructors without inlining +constructors, for example. + +The default c++-inlining mode is 'destructors', meaning that all member +functions with visible definitions will be considered for inlining. In some +cases the analyzer may still choose not to inline the function. + +Note that under 'constructors', constructors for types with non-trivial +destructors will not be inlined. Additionally, no C++ member functions will be +inlined under -analyzer-config ipa=none or -analyzer-config ipa=basic-inlining, +regardless of the setting of the c++-inlining mode. + +### c++-template-inlining ### + +This option controls whether C++ templated functions may be inlined. + + -analyzer-config c++-template-inlining=[true | false] + +Currently, template functions are considered for inlining by default. + +The motivation behind this option is that very generic code can be a source +of false positives, either by considering paths that the caller considers +impossible (by some unstated precondition), or by inlining some but not all +of a deep implementation of a function. + +### c++-stdlib-inlining ### + +This option controls whether functions from the C++ standard library, including +methods of the container classes in the Standard Template Library, should be +considered for inlining. + + -analyzer-config c++-stdlib-inlining=[true | false] + +Currently, C++ standard library functions are considered for inlining by +default. + +The standard library functions and the STL in particular are used ubiquitously +enough that our tolerance for false positives is even lower here. A false +positive due to poor modeling of the STL leads to a poor user experience, since +most users would not be comfortable adding assertions to system headers in order +to silence analyzer warnings. + +### c++-container-inlining ### + +This option controls whether constructors and destructors of "container" types +should be considered for inlining. + + -analyzer-config c++-container-inlining=[true | false] + +Currently, these constructors and destructors are NOT considered for inlining +by default. + +The current implementation of this setting checks whether a type has a member +named 'iterator' or a member named 'begin'; these names are idiomatic in C++, +with the latter specified in the C++11 standard. The analyzer currently does a +fairly poor job of modeling certain data structure invariants of container-like +objects. For example, these three expressions should be equivalent: + + std::distance(c.begin(), c.end()) == 0 + c.begin() == c.end() + c.empty()) + +Many of these issues are avoided if containers always have unknown, symbolic +state, which is what happens when their constructors are treated as opaque. +In the future, we may decide specific containers are "safe" to model through +inlining, or choose to model them directly using checkers instead. + + +Basics of Implementation +----------------------- + +The low-level mechanism of inlining a function is handled in +ExprEngine::inlineCall and ExprEngine::processCallExit. + +If the conditions are right for inlining, a CallEnter node is created and added +to the analysis work list. The CallEnter node marks the change to a new +LocationContext representing the called function, and its state includes the +contents of the new stack frame. When the CallEnter node is actually processed, +its single successor will be a edge to the first CFG block in the function. + +Exiting an inlined function is a bit more work, fortunately broken up into +reasonable steps: + +1. The CoreEngine realizes we're at the end of an inlined call and generates a + CallExitBegin node. + +2. ExprEngine takes over (in processCallExit) and finds the return value of the + function, if it has one. This is bound to the expression that triggered the + call. (In the case of calls without origin expressions, such as destructors, + this step is skipped.) + +3. Dead symbols and bindings are cleaned out from the state, including any local + bindings. + +4. A CallExitEnd node is generated, which marks the transition back to the + caller's LocationContext. + +5. Custom post-call checks are processed and the final nodes are pushed back + onto the work list, so that evaluation of the caller can continue. + +Retry Without Inlining +---------------------- + +In some cases, we would like to retry analysis without inlining a particular +call. + +Currently, we use this technique to recover coverage in case we stop +analyzing a path due to exceeding the maximum block count inside an inlined +function. + +When this situation is detected, we walk up the path to find the first node +before inlining was started and enqueue it on the WorkList with a special +ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining). The +path is then re-analyzed from that point without inlining that particular call. + +Deciding When to Inline +----------------------- + +In general, the analyzer attempts to inline as much as possible, since it +provides a better summary of what actually happens in the program. There are +some cases, however, where the analyzer chooses not to inline: + +- If there is no definition available for the called function or method. In + this case, there is no opportunity to inline. + +- If the CFG cannot be constructed for a called function, or the liveness + cannot be computed. These are prerequisites for analyzing a function body, + with or without inlining. + +- If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff + depth. This prevents unbounded analysis due to infinite recursion, but also + serves as a useful cutoff for performance reasons. + +- If the function is variadic. This is not a hard limitation, but an engineering + limitation. + + Tracked by: Support inlining of variadic functions + +- In C++, constructors are not inlined unless the destructor call will be + processed by the ExprEngine. Thus, if the CFG was built without nodes for + implicit destructors, or if the destructors for the given object are not + represented in the CFG, the constructor will not be inlined. (As an exception, + constructors for objects with trivial constructors can still be inlined.) + See "C++ Caveats" below. + +- In C++, ExprEngine does not inline custom implementations of operator 'new' + or operator 'delete', nor does it inline the constructors and destructors + associated with these. See "C++ Caveats" below. + +- Calls resulting in "dynamic dispatch" are specially handled. See more below. + +- The FunctionSummaries map stores additional information about declarations, + some of which is collected at runtime based on previous analyses. + We do not inline functions which were not profitable to inline in a different + context (for example, if the maximum block count was exceeded; see + "Retry Without Inlining"). + + +Dynamic Calls and Devirtualization +---------------------------------- + +"Dynamic" calls are those that are resolved at runtime, such as C++ virtual +method calls and Objective-C message sends. Due to the path-sensitive nature of +the analysis, the analyzer may be able to reason about the dynamic type of the +object whose method is being called and thus "devirtualize" the call. + +This path-sensitive devirtualization occurs when the analyzer can determine what +method would actually be called at runtime. This is possible when the type +information is constrained enough for a simulated C++/Objective-C object that +the analyzer can make such a decision. + + == DynamicTypeInfo == + +As the analyzer analyzes a path, it may accrue information to refine the +knowledge about the type of an object. This can then be used to make better +decisions about the target method of a call. + +Such type information is tracked as DynamicTypeInfo. This is path-sensitive +data that is stored in ProgramState, which defines a mapping from MemRegions to +an (optional) DynamicTypeInfo. + +If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily +inferred from the region's type or associated symbol. Information from symbolic +regions is weaker than from true typed regions. + + EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a + reference "A &ref" may dynamically be a subclass of 'A'. + +The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo, +updating it as information is observed along a path that can refine that type +information for a region. + + WARNING: Not all of the existing analyzer code has been retrofitted to use + DynamicTypeInfo, nor is it universally appropriate. In particular, + DynamicTypeInfo always applies to a region with all casts stripped + off, but sometimes the information provided by casts can be useful. + + + == RuntimeDefinition == + +The basis of devirtualization is CallEvent's getRuntimeDefinition() method, +which returns a RuntimeDefinition object. When asked to provide a definition, +the CallEvents for dynamic calls will use the DynamicTypeInfo in their +ProgramState to attempt to devirtualize the call. In the case of no dynamic +dispatch, or perfectly constrained devirtualization, the resulting +RuntimeDefinition contains a Decl corresponding to the definition of the called +function, and RuntimeDefinition::mayHaveOtherDefinitions will return FALSE. + +In the case of dynamic dispatch where our information is not perfect, CallEvent +can make a guess, but RuntimeDefinition::mayHaveOtherDefinitions will return +TRUE. The RuntimeDefinition object will then also include a MemRegion +corresponding to the object being called (i.e., the "receiver" in Objective-C +parlance), which ExprEngine uses to decide whether or not the call should be +inlined. + + == Inlining Dynamic Calls == + +The -analyzer-config ipa option has five different modes: none, basic-inlining, +inlining, dynamic, and dynamic-bifurcate. Under -analyzer-config ipa=dynamic, +all dynamic calls are inlined, whether we are certain or not that this will +actually be the definition used at runtime. Under -analyzer-config ipa=inlining, +only "near-perfect" devirtualized calls are inlined*, and other dynamic calls +are evaluated conservatively (as if no definition were available). + +* Currently, no Objective-C messages are not inlined under + -analyzer-config ipa=inlining, even if we are reasonably confident of the type + of the receiver. We plan to enable this once we have tested our heuristics + more thoroughly. + +The last option, -analyzer-config ipa=dynamic-bifurcate, behaves similarly to +"dynamic", but performs a conservative invalidation in the general virtual case +in *addition* to inlining. The details of this are discussed below. + +As stated above, -analyzer-config ipa=basic-inlining does not inline any C++ +member functions or Objective-C method calls, even if they are non-virtual or +can be safely devirtualized. + + +Bifurcation +----------- + +ExprEngine::BifurcateCall implements the -analyzer-config ipa=dynamic-bifurcate +mode. + +When a call is made on an object with imprecise dynamic type information +(RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine +bifurcates the path and marks the object's region (retrieved from the +RuntimeDefinition object) with a path-sensitive "mode" in the ProgramState. + +Currently, there are 2 modes: + + DynamicDispatchModeInlined - Models the case where the dynamic type information + of the receiver (MemoryRegion) is assumed to be perfectly constrained so + that a given definition of a method is expected to be the code actually + called. When this mode is set, ExprEngine uses the Decl from + RuntimeDefinition to inline any dynamically dispatched call sent to this + receiver because the function definition is considered to be fully resolved. + + DynamicDispatchModeConservative - Models the case where the dynamic type + information is assumed to be incorrect, for example, implies that the method + definition is overridden in a subclass. In such cases, ExprEngine does not + inline the methods sent to the receiver (MemoryRegion), even if a candidate + definition is available. This mode is conservative about simulating the + effects of a call. + +Going forward along the symbolic execution path, ExprEngine consults the mode +of the receiver's MemRegion to make decisions on whether the calls should be +inlined or not, which ensures that there is at most one split per region. + +At a high level, "bifurcation mode" allows for increased semantic coverage in +cases where the parent method contains code which is only executed when the +class is subclassed. The disadvantages of this mode are a (considerable?) +performance hit and the possibility of false positives on the path where the +conservative mode is used. + +Objective-C Message Heuristics +------------------------------ + +ExprEngine relies on a set of heuristics to partition the set of Objective-C +method calls into those that require bifurcation and those that do not. Below +are the cases when the DynamicTypeInfo of the object is considered precise +(cannot be a subclass): + + - If the object was created with +alloc or +new and initialized with an -init + method. + + - If the calls are property accesses using dot syntax. This is based on the + assumption that children rarely override properties, or do so in an + essentially compatible way. + + - If the class interface is declared inside the main source file. In this case + it is unlikely that it will be subclassed. + + - If the method is not declared outside of main source file, either by the + receiver's class or by any superclasses. + +C++ Caveats +-------------------- + +C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is +being constructed or destructed; that is, the type of the object depends on +which base constructors have been completed. This is tracked using +DynamicTypeInfo in the DynamicTypePropagation checker. + +There are several limitations in the current implementation: + +- Temporaries are poorly modeled right now because we're not confident in the + placement of their destructors in the CFG. We currently won't inline their + constructors unless the destructor is trivial, and don't process their + destructors at all, not even to invalidate the region. + +- 'new' is poorly modeled due to some nasty CFG/design issues. This is tracked + in PR12014. 'delete' is not modeled at all. + +- Arrays of objects are modeled very poorly right now. ExprEngine currently + only simulates the first constructor and first destructor. Because of this, + ExprEngine does not inline any constructors or destructors for arrays. + + +CallEvent +========= + +A CallEvent represents a specific call to a function, method, or other body of +code. It is path-sensitive, containing both the current state (ProgramStateRef) +and stack space (LocationContext), and provides uniform access to the argument +values and return type of a call, no matter how the call is written in the +source or what sort of code body is being invoked. + + NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to + NSInvocation. + +CallEvent should be used whenever there is logic dealing with function calls +that does not care how the call occurred. + +Examples include checking that arguments satisfy preconditions (such as +__attribute__((nonnull))), and attempting to inline a call. + +CallEvents are reference-counted objects managed by a CallEventManager. While +there is no inherent issue with persisting them (say, in a ProgramState's GDM), +they are intended for short-lived use, and can be recreated from CFGElements or +non-top-level StackFrameContexts fairly easily. Index: docs/analyzer/DesignDiscussions/InitializerLists.rst =================================================================== --- docs/analyzer/DesignDiscussions/InitializerLists.rst +++ docs/analyzer/DesignDiscussions/InitializerLists.rst @@ -1,3 +1,6 @@ +================ +Initializer List +================ This discussion took place in https://reviews.llvm.org/D35216 "Escape symbols when creating std::initializer_list". Index: docs/analyzer/DesignDiscussions/RegionStore.rst =================================================================== --- /dev/null +++ docs/analyzer/DesignDiscussions/RegionStore.rst @@ -0,0 +1,174 @@ +============ +Region Store +============ +The analyzer "Store" represents the contents of memory regions. It is an opaque +functional data structure stored in each ProgramState; the only class that can +modify the store is its associated StoreManager. + +Currently (Feb. 2013), the only StoreManager implementation being used is +RegionStoreManager. This store records bindings to memory regions using a "base +region + offset" key. (This allows `*p` and `p[0]` to map to the same location, +among other benefits.) + +Regions are grouped into "clusters", which roughly correspond to "regions with +the same base region". This allows certain operations to be more efficient, +such as invalidation. + +Regions that do not have a known offset use a special "symbolic" offset. These +keys store both the original region, and the "concrete offset region" -- the +last region whose offset is entirely concrete. (For example, in the expression +`foo.bar[1][i].baz`, the concrete offset region is the array `foo.bar[1]`, +since that has a known offset from the start of the top-level `foo` struct.) + + +Binding Invalidation +-------------------- + +Supporting both concrete and symbolic offsets makes things a bit tricky. Here's +an example: + + foo[0] = 0; + foo[1] = 1; + foo[i] = i; + +After the third assignment, nothing can be said about the value of `foo[0]`, +because `foo[i]` may have overwritten it! Thus, *binding to a region with a +symbolic offset invalidates the entire concrete offset region.* We know +`foo[i]` is somewhere within `foo`, so we don't have to invalidate anything +else, but we do have to be conservative about all other bindings within `foo`. + +Continuing the example: + + foo[i] = i; + foo[0] = 0; + +After this latest assignment, nothing can be said about the value of `foo[i]`, +because `foo[0]` may have overwritten it! *Binding to a region R with a +concrete offset invalidates any symbolic offset bindings whose concrete offset +region is a super-region **or** sub-region of R.* All we know about `foo[i]` is +that it is somewhere within `foo`, so changing *anything* within `foo` might +change `foo[i]`, and changing *all* of `foo` (or its base region) will +*definitely* change `foo[i]`. + +This logic could be improved by using the current constraints on `i`, at the +cost of speed. The latter case could also be improved by matching region kinds, +i.e. changing `foo[0].a` is unlikely to affect `foo[i].b`, no matter what `i` +is. + +For more detail, read through RegionStoreManager::removeSubRegionBindings in +RegionStore.cpp. + + +ObjCIvarRegions +--------------- + +Objective-C instance variables require a bit of special handling. Like struct +fields, they are not base regions, and when their parent object region is +invalidated, all the instance variables must be invalidated as well. However, +they have no concrete compile-time offsets (in the modern, "non-fragile" +runtime), and so cannot easily be represented as an offset from the start of +the object in the analyzer. Moreover, this means that invalidating a single +instance variable should *not* invalidate the rest of the object, since unlike +struct fields or array elements there is no way to perform pointer arithmetic +to access another instance variable. + +Consequently, although the base region of an ObjCIvarRegion is the entire +object, RegionStore offsets are computed from the start of the instance +variable. Thus it is not valid to assume that all bindings with non-symbolic +offsets start from the base region! + + +Region Invalidation +------------------- + +Unlike binding invalidation, region invalidation occurs when the entire +contents of a region may have changed---say, because it has been passed to a +function the analyzer can model, like memcpy, or because its address has +escaped, usually as an argument to an opaque function call. In these cases we +need to throw away not just all bindings within the region itself, but within +its entire cluster, since neighboring regions may be accessed via pointer +arithmetic. + +Region invalidation typically does even more than this, however. Because it +usually represents the complete escape of a region from the analyzer's model, +its *contents* must also be transitively invalidated. (For example, if a region +'p' of type 'int **' is invalidated, the contents of '*p' and '**p' may have +changed as well.) The algorithm that traverses this transitive closure of +accessible regions is known as ClusterAnalysis, and is also used for finding +all live bindings in the store (in order to throw away the dead ones). The name +"ClusterAnalysis" predates the cluster-based organization of bindings, but +refers to the same concept: during invalidation and liveness analysis, all +bindings within a cluster must be treated in the same way for a conservative +model of program behavior. + + +Default Bindings +---------------- + +Most bindings in RegionStore are simple scalar values -- integers and pointers. +These are known as "Direct" bindings. However, RegionStore supports a second +type of binding called a "Default" binding. These are used to provide values to +all the elements of an aggregate type (struct or array) without having to +explicitly specify a binding for each individual element. + +When there is no Direct binding for a particular region, the store manager +looks at each super-region in turn to see if there is a Default binding. If so, +this value is used as the value of the original region. The search ends when +the base region is reached, at which point the RegionStore will pick an +appropriate default value for the region (usually a symbolic value, but +sometimes zero, for static data, or "uninitialized", for stack variables). + + int manyInts[10]; + manyInts[1] = 42; // Creates a Direct binding for manyInts[1]. + print(manyInts[1]); // Retrieves the Direct binding for manyInts[1]; + print(manyInts[0]); // There is no Direct binding for manyInts[0]. + // Is there a Default binding for the entire array? + // There is not, but it is a stack variable, so we use + // "uninitialized" as the default value (and emit a + // diagnostic!). + +NOTE: The fact that bindings are stored as a base region plus an offset limits +the Default Binding strategy, because in C aggregates can contain other +aggregates. In the current implementation of RegionStore, there is no way to +distinguish a Default binding for an entire aggregate from a Default binding +for the sub-aggregate at offset 0. + + +Lazy Bindings (LazyCompoundVal) +------------------------------- + +RegionStore implements an optimization for copying aggregates (structs and +arrays) called "lazy bindings", implemented using a special SVal called +LazyCompoundVal. When the store is asked for the "binding" for an entire +aggregate (i.e. for an lvalue-to-rvalue conversion), it returns a +LazyCompoundVal instead. When this value is then stored into a variable, it is +bound as a Default value. This makes copying arrays and structs much cheaper +than if they had required memberwise access. + +Under the hood, a LazyCompoundVal is implemented as a uniqued pair of (region, +store), representing "the value of the region during this 'snapshot' of the +store". This has important implications for any sort of liveness or +reachability analysis, which must take the bindings in the old store into +account. + +Retrieving a value from a lazy binding happens in the same way as any other +Default binding: since there is no direct binding, the store manager falls back +to super-regions to look for an appropriate default binding. LazyCompoundVal +differs from a normal default binding, however, in that it contains several +different values, instead of one value that will appear several times. Because +of this, the store manager has to reconstruct the subregion chain on top of the +LazyCompoundVal region, and look up *that* region in the previous store. + +Here's a concrete example: + + CGPoint p; + p.x = 42; // A Direct binding is made to the FieldRegion 'p.x'. + CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a + // snapshot of the current store state. This value is then + // used as a Default binding for the VarRegion 'p2'. + return p2.x; // The binding for FieldRegion 'p2.x' is requested. + // There is no Direct binding, so we look for a Default + // binding to 'p2' and find the LCV. + // Because it's a LCV, we look at our requested region + // and see that it's the '.x' field. We ask for the value + // of 'p.x' within the snapshot, and get back 42. Index: docs/analyzer/DesignDiscussions/nullability.rst =================================================================== --- /dev/null +++ docs/analyzer/DesignDiscussions/nullability.rst @@ -0,0 +1,92 @@ +================== +Nullability Checks +================== + +This document is a high level description of the nullablility checks. +These checks intended to use the annotations that is described in this +RFC: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-March/041798.html. + +Let's consider the following 2 categories: + +1) nullable +============ + +If a pointer 'p' has a nullable annotation and no explicit null check or assert, we should warn in the following cases: +- 'p' gets implicitly converted into nonnull pointer, for example, we are passing it to a function that takes a nonnull parameter. +- 'p' gets dereferenced + +Taking a branch on nullable pointers are the same like taking branch on null unspecified pointers. + +Explicit cast from nullable to nonnul:: + + __nullable id foo; + id bar = foo; + takesNonNull((_nonnull) bar); <— should not warn here (backward compatibility hack) + anotherTakesNonNull(bar); <— would be great to warn here, but not necessary(*) + +Because bar corresponds to the same symbol all the time it is not easy to implement the checker that way the cast only suppress the first call but not the second. For this reason in the first implementation after a contradictory cast happens, I will treat bar as nullable unspecified, this way all of the warnings will be suppressed. Treating the symbol as nullable unspecified also has an advantage that in case the takesNonNull function body is being inlined, the will be no warning, when the symbol is dereferenced. In case I have time after the initial version I might spend additional time to try to find a more sophisticated solution, in which we would produce the second warning (*). + +2) nonnull +============ + +- Dereferencing a nonnull, or sending message to it is ok. +- Converting nonnull to nullable is Ok. +- When there is an explicit cast from nonnull to nullable I will trust the cast (it is probable there for a reason, because this cast does not suppress any warnings or errors). +- But what should we do about null checks?:: + + __nonnull id takesNonnull(__nonnull id x) { + if (x == nil) { + // Defensive backward compatible code: + .... + return nil; <- Should the analyzer cover this piece of code? Should we require the cast (__nonnull)nil? + } + .... + } + +There are these directions: +- We can either take the branch; this way the branch is analyzed + - Should we not warn about any nullability issues in that branch? Probably not, it is ok to break the nullability postconditions when the nullability preconditions are violated. +- We can assume that these pointers are not null and we lose coverage with the analyzer. (This can be implemented either in constraint solver or in the checker itself.) + +Other Issues to keep in mind/take care of: +Messaging: +- Sending a message to a nullable pointer + - Even though the method might return a nonnull pointer, when it was sent to a nullable pointer the return type will be nullable. + - The result is nullable unless the receiver is known to be non null. +- Sending a message to a unspecified or nonnull pointer + - If the pointer is not assumed to be nil, we should be optimistic and use the nullability implied by the method. + - This will not happen automatically, since the AST will have null unspecified in this case. + +Inlining +============ + +A symbol may need to be treated differently inside an inlined body. For example, consider these conversions from nonnull to nullable in presence of inlining:: + + id obj = getNonnull(); + takesNullable(obj); + takesNonnull(obj); + + void takesNullable(nullable id obj) { + obj->ivar // we should assume obj is nullable and warn here + } + +With no special treatment, when the takesNullable is inlined the analyzer will not warn when the obj symbol is dereferenced. One solution for this is to reanalyze takesNullable as a top level function to get possible violations. The alternative method, deducing nullability information from the arguments after inlining is not robust enough (for example there might be more parameters with different nullability, but in the given path the two parameters might end up being the same symbol or there can be nested functions that take different view of the nullability of the same symbol). So the symbol will remain nonnull to avoid false positives but the functions that takes nullable parameters will be analyzed separately as well without inlining. + +Annotations on multi level pointers +============ + +Tracking multiple levels of annotations for pointers pointing to pointers would make the checker more complicated, because this way a vector of nullability qualifiers would be needed to be tracked for each symbol. This is not a big caveat, since once the top level pointer is dereferenced, the symvol for the inner pointer will have the nullability information. The lack of multi level annotation tracking only observable, when multiple levels of pointers are passed to a function which has a parameter with multiple levels of annotations. So for now the checker support the top level nullability qualifiers only.:: + + int * __nonnull * __nullable p; + int ** q = p; + takesStarNullableStarNullable(q); + +Implementation notes +============ + +What to track? +- The checker would track memory regions, and to each relevant region a qualifier information would be attached which is either nullable, nonnull or null unspecified (or contradicted to suppress warnings for a specific region). +- On a branch, where a nullable pointer is known to be non null, the checker treat it as a same way as a pointer annotated as nonnull. +- When there is an explicit cast from a null unspecified to either nonnull or nullable I will trust the cast. +- Unannotated pointers are treated the same way as pointers annotated with nullability unspecified qualifier, unless the region is wrapped in ASSUME_NONNULL macros. +- We might want to implement a callback for entry points to top level functions, where the pointer nullability assumptions would be made. Index: docs/analyzer/IPA.txt =================================================================== --- docs/analyzer/IPA.txt +++ /dev/null @@ -1,386 +0,0 @@ -Inlining -======== - -There are several options that control which calls the analyzer will consider for -inlining. The major one is -analyzer-config ipa: - - -analyzer-config ipa=none - All inlining is disabled. This is the only mode - available in LLVM 3.1 and earlier and in Xcode 4.3 and earlier. - - -analyzer-config ipa=basic-inlining - Turns on inlining for C functions, C++ - static member functions, and blocks -- essentially, the calls that behave - like simple C function calls. This is essentially the mode used in - Xcode 4.4. - - -analyzer-config ipa=inlining - Turns on inlining when we can confidently find - the function/method body corresponding to the call. (C functions, static - functions, devirtualized C++ methods, Objective-C class methods, Objective-C - instance methods when ExprEngine is confident about the dynamic type of the - instance). - - -analyzer-config ipa=dynamic - Inline instance methods for which the type is - determined at runtime and we are not 100% sure that our type info is - correct. For virtual calls, inline the most plausible definition. - - -analyzer-config ipa=dynamic-bifurcate - Same as -analyzer-config ipa=dynamic, - but the path is split. We inline on one branch and do not inline on the - other. This mode does not drop the coverage in cases when the parent class - has code that is only exercised when some of its methods are overridden. - -Currently, -analyzer-config ipa=dynamic-bifurcate is the default mode. - -While -analyzer-config ipa determines in general how aggressively the analyzer -will try to inline functions, several additional options control which types of -functions can inlined, in an all-or-nothing way. These options use the -analyzer's configuration table, so they are all specified as follows: - - -analyzer-config OPTION=VALUE - -### c++-inlining ### - -This option controls which C++ member functions may be inlined. - - -analyzer-config c++-inlining=[none | methods | constructors | destructors] - -Each of these modes implies that all the previous member function kinds will be -inlined as well; it doesn't make sense to inline destructors without inlining -constructors, for example. - -The default c++-inlining mode is 'destructors', meaning that all member -functions with visible definitions will be considered for inlining. In some -cases the analyzer may still choose not to inline the function. - -Note that under 'constructors', constructors for types with non-trivial -destructors will not be inlined. Additionally, no C++ member functions will be -inlined under -analyzer-config ipa=none or -analyzer-config ipa=basic-inlining, -regardless of the setting of the c++-inlining mode. - -### c++-template-inlining ### - -This option controls whether C++ templated functions may be inlined. - - -analyzer-config c++-template-inlining=[true | false] - -Currently, template functions are considered for inlining by default. - -The motivation behind this option is that very generic code can be a source -of false positives, either by considering paths that the caller considers -impossible (by some unstated precondition), or by inlining some but not all -of a deep implementation of a function. - -### c++-stdlib-inlining ### - -This option controls whether functions from the C++ standard library, including -methods of the container classes in the Standard Template Library, should be -considered for inlining. - - -analyzer-config c++-stdlib-inlining=[true | false] - -Currently, C++ standard library functions are considered for inlining by -default. - -The standard library functions and the STL in particular are used ubiquitously -enough that our tolerance for false positives is even lower here. A false -positive due to poor modeling of the STL leads to a poor user experience, since -most users would not be comfortable adding assertions to system headers in order -to silence analyzer warnings. - -### c++-container-inlining ### - -This option controls whether constructors and destructors of "container" types -should be considered for inlining. - - -analyzer-config c++-container-inlining=[true | false] - -Currently, these constructors and destructors are NOT considered for inlining -by default. - -The current implementation of this setting checks whether a type has a member -named 'iterator' or a member named 'begin'; these names are idiomatic in C++, -with the latter specified in the C++11 standard. The analyzer currently does a -fairly poor job of modeling certain data structure invariants of container-like -objects. For example, these three expressions should be equivalent: - - std::distance(c.begin(), c.end()) == 0 - c.begin() == c.end() - c.empty()) - -Many of these issues are avoided if containers always have unknown, symbolic -state, which is what happens when their constructors are treated as opaque. -In the future, we may decide specific containers are "safe" to model through -inlining, or choose to model them directly using checkers instead. - - -Basics of Implementation ------------------------ - -The low-level mechanism of inlining a function is handled in -ExprEngine::inlineCall and ExprEngine::processCallExit. - -If the conditions are right for inlining, a CallEnter node is created and added -to the analysis work list. The CallEnter node marks the change to a new -LocationContext representing the called function, and its state includes the -contents of the new stack frame. When the CallEnter node is actually processed, -its single successor will be a edge to the first CFG block in the function. - -Exiting an inlined function is a bit more work, fortunately broken up into -reasonable steps: - -1. The CoreEngine realizes we're at the end of an inlined call and generates a - CallExitBegin node. - -2. ExprEngine takes over (in processCallExit) and finds the return value of the - function, if it has one. This is bound to the expression that triggered the - call. (In the case of calls without origin expressions, such as destructors, - this step is skipped.) - -3. Dead symbols and bindings are cleaned out from the state, including any local - bindings. - -4. A CallExitEnd node is generated, which marks the transition back to the - caller's LocationContext. - -5. Custom post-call checks are processed and the final nodes are pushed back - onto the work list, so that evaluation of the caller can continue. - -Retry Without Inlining ----------------------- - -In some cases, we would like to retry analysis without inlining a particular -call. - -Currently, we use this technique to recover coverage in case we stop -analyzing a path due to exceeding the maximum block count inside an inlined -function. - -When this situation is detected, we walk up the path to find the first node -before inlining was started and enqueue it on the WorkList with a special -ReplayWithoutInlining bit added to it (ExprEngine::replayWithoutInlining). The -path is then re-analyzed from that point without inlining that particular call. - -Deciding When to Inline ------------------------ - -In general, the analyzer attempts to inline as much as possible, since it -provides a better summary of what actually happens in the program. There are -some cases, however, where the analyzer chooses not to inline: - -- If there is no definition available for the called function or method. In - this case, there is no opportunity to inline. - -- If the CFG cannot be constructed for a called function, or the liveness - cannot be computed. These are prerequisites for analyzing a function body, - with or without inlining. - -- If the LocationContext chain for a given ExplodedNode reaches a maximum cutoff - depth. This prevents unbounded analysis due to infinite recursion, but also - serves as a useful cutoff for performance reasons. - -- If the function is variadic. This is not a hard limitation, but an engineering - limitation. - - Tracked by: Support inlining of variadic functions - -- In C++, constructors are not inlined unless the destructor call will be - processed by the ExprEngine. Thus, if the CFG was built without nodes for - implicit destructors, or if the destructors for the given object are not - represented in the CFG, the constructor will not be inlined. (As an exception, - constructors for objects with trivial constructors can still be inlined.) - See "C++ Caveats" below. - -- In C++, ExprEngine does not inline custom implementations of operator 'new' - or operator 'delete', nor does it inline the constructors and destructors - associated with these. See "C++ Caveats" below. - -- Calls resulting in "dynamic dispatch" are specially handled. See more below. - -- The FunctionSummaries map stores additional information about declarations, - some of which is collected at runtime based on previous analyses. - We do not inline functions which were not profitable to inline in a different - context (for example, if the maximum block count was exceeded; see - "Retry Without Inlining"). - - -Dynamic Calls and Devirtualization ----------------------------------- - -"Dynamic" calls are those that are resolved at runtime, such as C++ virtual -method calls and Objective-C message sends. Due to the path-sensitive nature of -the analysis, the analyzer may be able to reason about the dynamic type of the -object whose method is being called and thus "devirtualize" the call. - -This path-sensitive devirtualization occurs when the analyzer can determine what -method would actually be called at runtime. This is possible when the type -information is constrained enough for a simulated C++/Objective-C object that -the analyzer can make such a decision. - - == DynamicTypeInfo == - -As the analyzer analyzes a path, it may accrue information to refine the -knowledge about the type of an object. This can then be used to make better -decisions about the target method of a call. - -Such type information is tracked as DynamicTypeInfo. This is path-sensitive -data that is stored in ProgramState, which defines a mapping from MemRegions to -an (optional) DynamicTypeInfo. - -If no DynamicTypeInfo has been explicitly set for a MemRegion, it will be lazily -inferred from the region's type or associated symbol. Information from symbolic -regions is weaker than from true typed regions. - - EXAMPLE: A C++ object declared "A obj" is known to have the class 'A', but a - reference "A &ref" may dynamically be a subclass of 'A'. - -The DynamicTypePropagation checker gathers and propagates DynamicTypeInfo, -updating it as information is observed along a path that can refine that type -information for a region. - - WARNING: Not all of the existing analyzer code has been retrofitted to use - DynamicTypeInfo, nor is it universally appropriate. In particular, - DynamicTypeInfo always applies to a region with all casts stripped - off, but sometimes the information provided by casts can be useful. - - - == RuntimeDefinition == - -The basis of devirtualization is CallEvent's getRuntimeDefinition() method, -which returns a RuntimeDefinition object. When asked to provide a definition, -the CallEvents for dynamic calls will use the DynamicTypeInfo in their -ProgramState to attempt to devirtualize the call. In the case of no dynamic -dispatch, or perfectly constrained devirtualization, the resulting -RuntimeDefinition contains a Decl corresponding to the definition of the called -function, and RuntimeDefinition::mayHaveOtherDefinitions will return FALSE. - -In the case of dynamic dispatch where our information is not perfect, CallEvent -can make a guess, but RuntimeDefinition::mayHaveOtherDefinitions will return -TRUE. The RuntimeDefinition object will then also include a MemRegion -corresponding to the object being called (i.e., the "receiver" in Objective-C -parlance), which ExprEngine uses to decide whether or not the call should be -inlined. - - == Inlining Dynamic Calls == - -The -analyzer-config ipa option has five different modes: none, basic-inlining, -inlining, dynamic, and dynamic-bifurcate. Under -analyzer-config ipa=dynamic, -all dynamic calls are inlined, whether we are certain or not that this will -actually be the definition used at runtime. Under -analyzer-config ipa=inlining, -only "near-perfect" devirtualized calls are inlined*, and other dynamic calls -are evaluated conservatively (as if no definition were available). - -* Currently, no Objective-C messages are not inlined under - -analyzer-config ipa=inlining, even if we are reasonably confident of the type - of the receiver. We plan to enable this once we have tested our heuristics - more thoroughly. - -The last option, -analyzer-config ipa=dynamic-bifurcate, behaves similarly to -"dynamic", but performs a conservative invalidation in the general virtual case -in *addition* to inlining. The details of this are discussed below. - -As stated above, -analyzer-config ipa=basic-inlining does not inline any C++ -member functions or Objective-C method calls, even if they are non-virtual or -can be safely devirtualized. - - -Bifurcation ------------ - -ExprEngine::BifurcateCall implements the -analyzer-config ipa=dynamic-bifurcate -mode. - -When a call is made on an object with imprecise dynamic type information -(RuntimeDefinition::mayHaveOtherDefinitions() evaluates to TRUE), ExprEngine -bifurcates the path and marks the object's region (retrieved from the -RuntimeDefinition object) with a path-sensitive "mode" in the ProgramState. - -Currently, there are 2 modes: - - DynamicDispatchModeInlined - Models the case where the dynamic type information - of the receiver (MemoryRegion) is assumed to be perfectly constrained so - that a given definition of a method is expected to be the code actually - called. When this mode is set, ExprEngine uses the Decl from - RuntimeDefinition to inline any dynamically dispatched call sent to this - receiver because the function definition is considered to be fully resolved. - - DynamicDispatchModeConservative - Models the case where the dynamic type - information is assumed to be incorrect, for example, implies that the method - definition is overridden in a subclass. In such cases, ExprEngine does not - inline the methods sent to the receiver (MemoryRegion), even if a candidate - definition is available. This mode is conservative about simulating the - effects of a call. - -Going forward along the symbolic execution path, ExprEngine consults the mode -of the receiver's MemRegion to make decisions on whether the calls should be -inlined or not, which ensures that there is at most one split per region. - -At a high level, "bifurcation mode" allows for increased semantic coverage in -cases where the parent method contains code which is only executed when the -class is subclassed. The disadvantages of this mode are a (considerable?) -performance hit and the possibility of false positives on the path where the -conservative mode is used. - -Objective-C Message Heuristics ------------------------------- - -ExprEngine relies on a set of heuristics to partition the set of Objective-C -method calls into those that require bifurcation and those that do not. Below -are the cases when the DynamicTypeInfo of the object is considered precise -(cannot be a subclass): - - - If the object was created with +alloc or +new and initialized with an -init - method. - - - If the calls are property accesses using dot syntax. This is based on the - assumption that children rarely override properties, or do so in an - essentially compatible way. - - - If the class interface is declared inside the main source file. In this case - it is unlikely that it will be subclassed. - - - If the method is not declared outside of main source file, either by the - receiver's class or by any superclasses. - -C++ Caveats --------------------- - -C++11 [class.cdtor]p4 describes how the vtable of an object is modified as it is -being constructed or destructed; that is, the type of the object depends on -which base constructors have been completed. This is tracked using -DynamicTypeInfo in the DynamicTypePropagation checker. - -There are several limitations in the current implementation: - -- Temporaries are poorly modeled right now because we're not confident in the - placement of their destructors in the CFG. We currently won't inline their - constructors unless the destructor is trivial, and don't process their - destructors at all, not even to invalidate the region. - -- 'new' is poorly modeled due to some nasty CFG/design issues. This is tracked - in PR12014. 'delete' is not modeled at all. - -- Arrays of objects are modeled very poorly right now. ExprEngine currently - only simulates the first constructor and first destructor. Because of this, - ExprEngine does not inline any constructors or destructors for arrays. - - -CallEvent -========= - -A CallEvent represents a specific call to a function, method, or other body of -code. It is path-sensitive, containing both the current state (ProgramStateRef) -and stack space (LocationContext), and provides uniform access to the argument -values and return type of a call, no matter how the call is written in the -source or what sort of code body is being invoked. - - NOTE: For those familiar with Cocoa, CallEvent is roughly equivalent to - NSInvocation. - -CallEvent should be used whenever there is logic dealing with function calls -that does not care how the call occurred. - -Examples include checking that arguments satisfy preconditions (such as -__attribute__((nonnull))), and attempting to inline a call. - -CallEvents are reference-counted objects managed by a CallEventManager. While -there is no inherent issue with persisting them (say, in a ProgramState's GDM), -they are intended for short-lived use, and can be recreated from CFGElements or -non-top-level StackFrameContexts fairly easily. Index: docs/analyzer/RegionStore.txt =================================================================== --- docs/analyzer/RegionStore.txt +++ /dev/null @@ -1,171 +0,0 @@ -The analyzer "Store" represents the contents of memory regions. It is an opaque -functional data structure stored in each ProgramState; the only class that can -modify the store is its associated StoreManager. - -Currently (Feb. 2013), the only StoreManager implementation being used is -RegionStoreManager. This store records bindings to memory regions using a "base -region + offset" key. (This allows `*p` and `p[0]` to map to the same location, -among other benefits.) - -Regions are grouped into "clusters", which roughly correspond to "regions with -the same base region". This allows certain operations to be more efficient, -such as invalidation. - -Regions that do not have a known offset use a special "symbolic" offset. These -keys store both the original region, and the "concrete offset region" -- the -last region whose offset is entirely concrete. (For example, in the expression -`foo.bar[1][i].baz`, the concrete offset region is the array `foo.bar[1]`, -since that has a known offset from the start of the top-level `foo` struct.) - - -Binding Invalidation -==================== - -Supporting both concrete and symbolic offsets makes things a bit tricky. Here's -an example: - - foo[0] = 0; - foo[1] = 1; - foo[i] = i; - -After the third assignment, nothing can be said about the value of `foo[0]`, -because `foo[i]` may have overwritten it! Thus, *binding to a region with a -symbolic offset invalidates the entire concrete offset region.* We know -`foo[i]` is somewhere within `foo`, so we don't have to invalidate anything -else, but we do have to be conservative about all other bindings within `foo`. - -Continuing the example: - - foo[i] = i; - foo[0] = 0; - -After this latest assignment, nothing can be said about the value of `foo[i]`, -because `foo[0]` may have overwritten it! *Binding to a region R with a -concrete offset invalidates any symbolic offset bindings whose concrete offset -region is a super-region **or** sub-region of R.* All we know about `foo[i]` is -that it is somewhere within `foo`, so changing *anything* within `foo` might -change `foo[i]`, and changing *all* of `foo` (or its base region) will -*definitely* change `foo[i]`. - -This logic could be improved by using the current constraints on `i`, at the -cost of speed. The latter case could also be improved by matching region kinds, -i.e. changing `foo[0].a` is unlikely to affect `foo[i].b`, no matter what `i` -is. - -For more detail, read through RegionStoreManager::removeSubRegionBindings in -RegionStore.cpp. - - -ObjCIvarRegions -=============== - -Objective-C instance variables require a bit of special handling. Like struct -fields, they are not base regions, and when their parent object region is -invalidated, all the instance variables must be invalidated as well. However, -they have no concrete compile-time offsets (in the modern, "non-fragile" -runtime), and so cannot easily be represented as an offset from the start of -the object in the analyzer. Moreover, this means that invalidating a single -instance variable should *not* invalidate the rest of the object, since unlike -struct fields or array elements there is no way to perform pointer arithmetic -to access another instance variable. - -Consequently, although the base region of an ObjCIvarRegion is the entire -object, RegionStore offsets are computed from the start of the instance -variable. Thus it is not valid to assume that all bindings with non-symbolic -offsets start from the base region! - - -Region Invalidation -=================== - -Unlike binding invalidation, region invalidation occurs when the entire -contents of a region may have changed---say, because it has been passed to a -function the analyzer can model, like memcpy, or because its address has -escaped, usually as an argument to an opaque function call. In these cases we -need to throw away not just all bindings within the region itself, but within -its entire cluster, since neighboring regions may be accessed via pointer -arithmetic. - -Region invalidation typically does even more than this, however. Because it -usually represents the complete escape of a region from the analyzer's model, -its *contents* must also be transitively invalidated. (For example, if a region -'p' of type 'int **' is invalidated, the contents of '*p' and '**p' may have -changed as well.) The algorithm that traverses this transitive closure of -accessible regions is known as ClusterAnalysis, and is also used for finding -all live bindings in the store (in order to throw away the dead ones). The name -"ClusterAnalysis" predates the cluster-based organization of bindings, but -refers to the same concept: during invalidation and liveness analysis, all -bindings within a cluster must be treated in the same way for a conservative -model of program behavior. - - -Default Bindings -================ - -Most bindings in RegionStore are simple scalar values -- integers and pointers. -These are known as "Direct" bindings. However, RegionStore supports a second -type of binding called a "Default" binding. These are used to provide values to -all the elements of an aggregate type (struct or array) without having to -explicitly specify a binding for each individual element. - -When there is no Direct binding for a particular region, the store manager -looks at each super-region in turn to see if there is a Default binding. If so, -this value is used as the value of the original region. The search ends when -the base region is reached, at which point the RegionStore will pick an -appropriate default value for the region (usually a symbolic value, but -sometimes zero, for static data, or "uninitialized", for stack variables). - - int manyInts[10]; - manyInts[1] = 42; // Creates a Direct binding for manyInts[1]. - print(manyInts[1]); // Retrieves the Direct binding for manyInts[1]; - print(manyInts[0]); // There is no Direct binding for manyInts[0]. - // Is there a Default binding for the entire array? - // There is not, but it is a stack variable, so we use - // "uninitialized" as the default value (and emit a - // diagnostic!). - -NOTE: The fact that bindings are stored as a base region plus an offset limits -the Default Binding strategy, because in C aggregates can contain other -aggregates. In the current implementation of RegionStore, there is no way to -distinguish a Default binding for an entire aggregate from a Default binding -for the sub-aggregate at offset 0. - - -Lazy Bindings (LazyCompoundVal) -=============================== - -RegionStore implements an optimization for copying aggregates (structs and -arrays) called "lazy bindings", implemented using a special SVal called -LazyCompoundVal. When the store is asked for the "binding" for an entire -aggregate (i.e. for an lvalue-to-rvalue conversion), it returns a -LazyCompoundVal instead. When this value is then stored into a variable, it is -bound as a Default value. This makes copying arrays and structs much cheaper -than if they had required memberwise access. - -Under the hood, a LazyCompoundVal is implemented as a uniqued pair of (region, -store), representing "the value of the region during this 'snapshot' of the -store". This has important implications for any sort of liveness or -reachability analysis, which must take the bindings in the old store into -account. - -Retrieving a value from a lazy binding happens in the same way as any other -Default binding: since there is no direct binding, the store manager falls back -to super-regions to look for an appropriate default binding. LazyCompoundVal -differs from a normal default binding, however, in that it contains several -different values, instead of one value that will appear several times. Because -of this, the store manager has to reconstruct the subregion chain on top of the -LazyCompoundVal region, and look up *that* region in the previous store. - -Here's a concrete example: - - CGPoint p; - p.x = 42; // A Direct binding is made to the FieldRegion 'p.x'. - CGPoint p2 = p; // A LazyCompoundVal is created for 'p', along with a - // snapshot of the current store state. This value is then - // used as a Default binding for the VarRegion 'p2'. - return p2.x; // The binding for FieldRegion 'p2.x' is requested. - // There is no Direct binding, so we look for a Default - // binding to 'p2' and find the LCV. - // Because it's a LCV, we look at our requested region - // and see that it's the '.x' field. We ask for the value - // of 'p.x' within the snapshot, and get back 42. Index: docs/analyzer/checkers.rst =================================================================== --- /dev/null +++ docs/analyzer/checkers.rst @@ -0,0 +1,262 @@ +===================== +Checkers +===================== + +The analyzer performs checks that are categorized into families or "checkers". +The default set of checkers covers a variety of checks targeted at finding security and API usage bugs, +dead code, and other logic errors. See the :ref:`default-checkers` checkers list below. +In addition to these, the analyzer contains a number of Experimental :ref:`alpha-checkers` Checkers. + +.. contents:: Table of Contents + :depth: 3 + + +.. _deafault-checkers: + +Defaults Checkers +----------------- + +.. _core-checkers: + +core +^^^^ +Models core language features and contains general-purpose checkers such as division by zero, +null pointer dereference, usage of uninitialized values, etc. +*These checkers must be always switched on as other checker rely on them.* + +* **core.CallAndMessage** (C, C++, ObjC) Check for logical errors for function calls and Objective-C message expressions (e.g., uninitialized arguments, null function pointers) +* **core.DivideZero** (C, C++, ObjC) Check for division by zero +* **core.NonNullParamChecker** (C, C++, ObjC) Check for null pointers passed as arguments to a function whose arguments are references or marked with the 'nonnull' attribute +* **core.NullDereference** (C, C++, ObjC) Check for dereferences of null pointers +* **core.StackAddressEscape** (C) Check that addresses to stack memory do not escape the function +* **core.UndefinedBinaryOperatorResult** (C) Check for undefined results of binary operators +* **core.VLASize** (C) Check for declarations of VLA of undefined or zero size +* **core.uninitialized.ArraySubscript** (C) Check for uninitialized values used as array subscripts +* **core.uninitialized.Assign** (C) Check for assigning uninitialized values +* **core.uninitialized.Branch** (C) Check for uninitialized values used as branch conditions +* **core.uninitialized.CapturedBlockVariable** (C) Check for blocks that capture uninitialized values +* **core.uninitialized.UndefReturn** (C) Check for uninitialized values being returned to the caller + +.. _cplusplus-checkers: + +cpluslus +^^^^^^^^ + +C++ Checkers. + +* **cplusplus.InnerPointer** Check for inner pointers of C++ containers used after re/deallocation +* **cplusplus.NewDelete** (C++) Check for double-free and use-after-free problems. Traces memory managed by new/delete. +* **cplusplus.NewDeleteLeaks** (C++) Check for memory leaks. Traces memory managed by new/delete. +* **cplusplus.SelfAssignment** Checks C++ copy and move assignment operators for self assignment + +.. _deadcode-checkers: + +deadcode +^^^^^^^^ + +Dead Code Checkers. + +* **deadcode.DeadStores** (C) Check for values stored to variables that are never read afterwards + + +.. _nullability-checkers: + +nullability +^^^^^^^^^^^ + +Objective C checkers that warn for null pointer passing and dereferencing errors. + +* **nullability.NullPassedToNonnull** (ObjC) Warns when a null pointer is passed to a pointer which has a _Nonnull type. +* **nullability.NullReturnedFromNonnull** (ObjC) Warns when a null pointer is returned from a function that has _Nonnull return type. +* **nullability.NullableDereferenced** (ObjC) Warns when a nullable pointer is dereferenced. +* **nullability.NullablePassedToNonnull** (ObjC) Warns when a nullable pointer is passed to a pointer which has a _Nonnull type. +* **nullability.NullableReturnedFromNonnull** (ObjC) Warns when a nullable pointer is returned from a function that has _Nonnull return type. + +.. _optin-checkers: + +optin +^^^^^ + +Checkers for portability, performance or coding style specific rules. + +* **optin.cplusplus.VirtualCall** (C++) Check virtual function calls during construction or destruction +* **optin.mpi.MPI-Checker** (C) Checks MPI code +* **optin.osx.cocoa.localizability.EmptyLocalizationContextChecker** (ObjC) Check that NSLocalizedString macros include a comment for context +* **optin.osx.cocoa.localizability.NonLocalizedStringChecker** (ObjC) Warns about uses of non-localized NSStrings passed to UI methods expecting localized NSStrings +* **optin.performance.GCDAntipattern** Check for performance anti-patterns when using Grand Central Dispatch +* **optin.performance.Padding** Check for excessively padded structs. +* **optin.portability.UnixAPI** Finds implementation-defined behavior in UNIX/Posix functions + + +.. _security-checkers: + +security +^^^^^^^^ + +Security related checkers. + +* **security.FloatLoopCounter** (C)Warn on using a floating point value as a loop counter (CERT: FLP30-C, FLP30-CPP) +* **security.insecureAPI.UncheckedReturn** (C) Warn on uses of functions whose return values must be always checked +* **security.insecureAPI.bcmp8** (C) Warn on uses of the 'bcmp' function +* **security.insecureAPI.bcopy** (C) Warn on uses of the 'bcopy' function +* **security.insecureAPI.bzero** (C) Warn on uses of the 'bzero' function +* **security.insecureAPI.getpw** (C) Warn on uses of the 'getpw' function +* **security.insecureAPI.gets** (C) Warn on uses of the 'gets' function +* **security.insecureAPI.mkstemp** (C) Warn when 'mkstemp' is passed fewer than 6 X's in the format string +* **security.insecureAPI.mktemp** (C) Warn on uses of the 'mktemp' function +* **security.insecureAPI.rand** (C) Warn on uses of the 'rand', 'random', and related functions +* **security.insecureAPI.strcpy** (C) Warn on uses of the 'strcpy' and 'strcat' functions +* **security.insecureAPI.vfork** (C) Warn on uses of the 'vfork' function + + +.. _unix-checkers: + +unix +^^^^ +POSIX/Unix checkers. + +* **unix.API** (C) Check calls to various UNIX/Posix functions +* **unix.Malloc** (C) Check for memory leaks, double free, and use-after-free problems. Traces memory managed by malloc()/free(). +* **unix.MallocSizeof** (C) Check for dubious malloc arguments involving sizeof +* **unix.MismatchedDeallocator** (C) Check for mismatched deallocators. +* **unix.Vfork** (C) Check for proper usage of vfork +* **unix.cstring.BadSizeArg** (C) Check the size argument passed into C string functions for common erroneous patterns +* **unix.cstrisng.NullArg** (C) Check for null pointers being passed as arguments to C string functions + + +.. _osx-checkers: + +osx +^^^ +OS X checkers. + +* **osx.API** (C) Check for proper uses of various Apple APIs +* **osx.NumberObjectConversion** (C, C++, ObjC) Check for erroneous conversions of objects representing numbers into numbers +* **osx.ObjCProperty** Check for proper uses of Objective-C properties +* **osx.SecKeychainAPI** (C) Check for proper uses of Secure Keychain APIs +* **osx.cocoa.AtSync** (ObjC) Check for nil pointers used as mutexes for @synchronized +* **osx.cocoa.AutoreleaseWrite** Warn about potentially crashing writes to autoreleasing objects from different autoreleasing pools in Objective-C +* **osx.cocoa.ClassRelease** (ObjC) Check for sending 'retain', 'release', or 'autorelease' directly to a Class +* **osx.cocoa.Dealloc** (ObjC) Warn about Objective-C classes that lack a correct implementation of -dealloc +* **osx.cocoa.IncompatibleMethodTypes** (ObjC) Warn about Objective-C method signatures with type incompatibilities +* **osx.cocoa.Loops** Improved modeling of loops using Cocoa collection types +* **osx.cocoa.MissingSuperCall** (ObjC) Warn about Objective-C methods that lack a necessary call to super +* **osx.cocoa.NSAutoreleasePool** (ObjC) Warn for suboptimal uses of NSAutoreleasePool in Objective-C GC mode +* **osx.cocoa.NSError** (ObjC) Check usage of NSError parameters +* **osx.cocoa.NilArg** (ObjC) Check for prohibited nil arguments to ObjC method calls +* **osx.cocoa.NonNilReturnValue** Model the APIs that are guaranteed to return a non-nil value +* **osx.cocoa.ObjCGenerics** (ObjC) Check for type errors when using Objective-C generics +* **osx.cocoa.RetainCount** (ObjC) Check for leaks and improper reference count management +* **osx.cocoa.RunLoopAutoreleaseLeak** Check for leaked memory in autorelease pools that will never be drained +* **osx.cocoa.SelfInit** (ObjC) Check that 'self' is properly initialized inside an initializer method +* **osx.cocoa.SuperDealloc** (ObjC) Warn about improper use of '[super dealloc]' in Objective-C +* **osx.cocoa.UnusedIvars** (ObjC) Warn about private ivars that are never used +* **osx.cocoa.VariadicMethodTypes** (ObjC) Check for passing non-Objective-C types to variadic collection initialization methods that expect only Objective-C types +* **osx.coreFoundation.CFError** (C) Check usage of CFErrorRef* parameters +* **osx.coreFoundation.CFNumber** (C) Check for proper uses of CFNumber APIs +* **osx.coreFoundation.CFRetainRelease** (C) Check for null arguments to CFRetain/CFRelease/CFMakeCollectable +* **osx.coreFoundation.containers.OutOfBounds** (C) Checks for index out-of-bounds when using 'CFArray' API +* **osx.coreFoundation.containers.PointerSizedValues** (C) Warns if 'CFArray', 'CFDictionary', 'CFSet' are created with non-pointer-size values + + +.. _alpha-checkers: + +Experimental Checkers +--------------------- + +*These are checkers with known issues or limitations that keep them from being on by default. They are likely to have false positives. Bug reports and especially patches are welcome.* + +alpha.clone +^^^^^^^^^^^ + +* **alpha.clone.CloneChecker** Reports similar pieces of code. +* **alpha.core.BoolAssignment** Reports similar pieces of code. + +alpha.core +^^^^^^^^^^ + +* **alpha.core.CallAndMessageUnInitRefArg** Check for logical errors for function calls and Objective-C message expressions (e.g., uninitialized arguments, null function pointers, and pointer to undefined variables) +* **alpha.core.CastSize** Check when casting a malloc'ed type T, whether the size is a multiple of the size of T +* **alpha.core.CastToStruct** Check for cast from non-struct pointer to struct pointer +* **alpha.core.Conversion** Loss of sign/precision in implicit conversions +* **alpha.core.DynamicTypeChecker** Check for cases where the dynamic and the static type of an object are unrelated. +* **alpha.core.FixedAddr** Check for assignment of a fixed address to a pointer +* **alpha.core.IdenticalExpr** Warn about unintended use of identical expressions in operators +* **alpha.core.PointerArithm** Check for pointer arithmetic on locations other than array elements +* **alpha.core.PointerSub** Check for pointer subtractions on two pointers pointing to different memory chunks +* **alpha.core.SizeofPtr** Warn about unintended use of sizeof() on pointer expressions +* **alpha.core.StackAddressAsyncEscape** Check that addresses to stack memory do not escape the function +* **alpha.core.TestAfterDivZero** Check for division by variable that is later compared against 0. Either the comparison is useless or there is division by zero. + +alpha.cplusplus +^^^^^^^^^^^ + +* **alpha.cplusplus.DeleteWithNonVirtualDtor** Reports destructions of polymorphic objects with a non-virtual destructor in their base class +* **alpha.cplusplus.InvalidatedIterator** Check for use of invalidated iterators +* **alpha.cplusplus.IteratorRange** Check for iterators used outside their valid ranges +* **alpha.cplusplus.MismatchedIterator** Check for use of iterators of different containers where iterators of the same container are expected +* **alpha.cplusplus.MisusedMovedObject** Method calls on a moved-from object and copying a moved-from object will be reported +* :doc:`checkers/UninitializedObject` Reports uninitialized fields after object construction + +alpha.deadcode +^^^^^^^^^^^^^^ +* **alpha.deadcode.UnreachableCode** Check unreachable code + +alpha.osx +^^^^^^^^^ + +* **alpha.osx.cocoa.DirectIvarAssignment** Check for direct assignments to instance variables +* **alpha.osx.cocoa.DirectIvarAssignmentForAnnotatedFunctions** Check for direct assignments to instance variables in the methods annotated with objc_no_direct_instance_variable_assignment +* **alpha.osx.cocoa.InstanceVariableInvalidation** Check that the invalidatable instance variables are invalidated in the methods annotated with objc_instance_variable_invalidator +* **alpha.osx.cocoa.MissingInvalidationMethod** Check that the invalidation methods are present in classes that contain invalidatable instance variables +* **alpha.osx.cocoa.localizability.PluralMisuseChecker** Warns against using one vs. many plural pattern in code when generating localized strings. + +alpha.security +^^^^^^^^^^^^^^ +* **alpha.security.ArrayBound** Warn about buffer overflows (older checker) +* **alpha.security.ArrayBoundV2** Warn about buffer overflows (newer checker) +* **alpha.security.MallocOverflow** Check for overflows in the arguments to malloc() +* **alpha.security.MmapWriteExec** Warn on mmap() calls that are both writable and executable +* **alpha.security.ReturnPtrRange** Check for an out-of-bound pointer being returned to callers +* **alpha.security.taint.TaintPropagation** Generate taint information used by other checkers + +alpha.unix +^^^^^^^^^^^ + +* **alpha.unix.BlockInCriticalSection** Check for calls to blocking functions inside a critical section +* **alpha.unix.Chroot** Check improper use of chroot +* **alpha.unix.PthreadLock** Simple lock -> unlock checker +* **alpha.unix.SimpleStream** Check for misuses of stream APIs +* **alpha.unix.Stream** Check stream handling functions +* **alpha.unix.cstring.BufferOverlap** Checks for overlap in two buffer arguments +* **alpha.unix.cstring.NotNullTerminated** Check for arguments which are not null-terminating strings +* **alpha.unix.cstring.OutOfBounds** Check for out-of-bounds access in string functions + + +Debug Checkers +--------------- + +.. _debug-checkers: + + +debug +^^^^^ + +Checkers used for debugging the analyzer. +:doc:`DebugChecks` page contains a detailed description. + +* **debug.AnalysisOrder** Print callbacks that are called during analysis in order +* **debug.ConfigDumper** Dump config table +* **debug.DumpCFG Display** Control-Flow Graphs +* **debug.DumpCallGraph** Display Call Graph +* **debug.DumpCalls** Print calls as they are traversed by the engine +* **debug.DumpDominators** Print the dominance tree for a given CFG +* **debug.DumpLiveVars** Print results of live variable analysis +* **debug.DumpTraversal** Print branch conditions as they are traversed by the engine +* **debug.ExprInspection** Check the analyzer's understanding of expressions +* **debug.Stats** Emit warnings with analyzer statistics +* **debug.TaintTest** Mark tainted symbols as such. +* **debug.ViewCFG** View Control-Flow Graphs using GraphViz +* **debug.ViewCallGraph** View Call Graph using GraphViz +* **debug.ViewExplodedGraph** View Exploded Graphs using GraphViz + Index: docs/analyzer/checkers/UninitializedObject.rst =================================================================== --- /dev/null +++ docs/analyzer/checkers/UninitializedObject.rst @@ -0,0 +1,103 @@ +***************************************** +alpha.cplusplus.UninitializedObject (C++) +***************************************** + +This checker reports uninitialized fields in objects created after a constructor call. +It doesn't only find direct uninitialized fields, but rather makes a deep inspection +of the object, analyzing all of it's fields subfields. +The checker regards inherited fields as direct fields, so one will +recieve warnings for uninitialized inherited data members as well. + +Examples +-------- + +.. code-block:: cpp + + // With Pedantic and CheckPointeeInitialization set to true + + struct A { + struct B { + int x; // note: uninitialized field 'this->b.x' + // note: uninitialized field 'this->bptr->x' + int y; // note: uninitialized field 'this->b.y' + // note: uninitialized field 'this->bptr->y' + }; + int *iptr; // note: uninitialized pointer 'this->iptr' + B b; + B *bptr; + char *cptr; // note: uninitialized pointee 'this->cptr' + + A (B *bptr, char *cptr) : bptr(bptr), cptr(cptr) {} + }; + + void f() { + A::B b; + char c; + A a(&b, &c); // warning: 6 uninitialized fields + // after the constructor call + } + + // With Pedantic set to false and + // CheckPointeeInitialization set to true + // (every field is uninitialized) + + struct A { + struct B { + int x; + int y; + }; + int *iptr; + B b; + B *bptr; + char *cptr; + + A (B *bptr, char *cptr) : bptr(bptr), cptr(cptr) {} + }; + + void f() { + A::B b; + char c; + A a(&b, &c); // no warning + } + + // With Pedantic set to true and + // CheckPointeeInitialization set to false + // (pointees are regarded as initialized) + + struct A { + struct B { + int x; // note: uninitialized field 'this->b.x' + int y; // note: uninitialized field 'this->b.y' + }; + int *iptr; // note: uninitialized pointer 'this->iptr' + B b; + B *bptr; + char *cptr; + + A (B *bptr, char *cptr) : bptr(bptr), cptr(cptr) {} + }; + + void f() { + A::B b; + char c; + A a(&b, &c); // warning: 3 uninitialized fields + // after the constructor call + } + + +Options +------- + +This checker has several options which can be set from command line (e.g. ``-analyzer-config alpha.cplusplus.UninitializedObject:Pedantic=true``): + +* **Pedantic** (boolean). If to false, the checker won't emit warnings for objects that don't have at least one initialized field. Defaults to false. + +* **NotesAsWarnings** (boolean). If set to true, the checker will emit a warning for each uninitalized field, as opposed to emitting one warning per constructor call, and listing the uninitialized fields that belongs to it in notes. *Defaults to false.*. + +* **CheckPointeeInitialization** (boolean). If set to false, the checker will not analyze the pointee of pointer/reference fields, and will only check whether the object itself is initialized. *Defaults to false.*. + +* **IgnoreRecordsWithField** (string). If supplied, the checker will not analyze structures that have a field with a name or type name that matches the given pattern. *Defaults to ""*. Can be set with ``-analyzer-config alpha.cplusplus.UninitializedObject:IgnoreRecordsWithField="[Tt]ag|[Kk]ind"``. + +Limitations & Known False Positives +----------------------------------- +None. Index: docs/analyzer/nullability.rst =================================================================== --- docs/analyzer/nullability.rst +++ /dev/null @@ -1,92 +0,0 @@ -============ -Nullability Checks -============ - -This document is a high level description of the nullablility checks. -These checks intended to use the annotations that is described in this -RFC: http://lists.cs.uiuc.edu/pipermail/cfe-dev/2015-March/041798.html. - -Let's consider the following 2 categories: - -1) nullable -============ - -If a pointer 'p' has a nullable annotation and no explicit null check or assert, we should warn in the following cases: -- 'p' gets implicitly converted into nonnull pointer, for example, we are passing it to a function that takes a nonnull parameter. -- 'p' gets dereferenced - -Taking a branch on nullable pointers are the same like taking branch on null unspecified pointers. - -Explicit cast from nullable to nonnul:: - - __nullable id foo; - id bar = foo; - takesNonNull((_nonnull) bar); <— should not warn here (backward compatibility hack) - anotherTakesNonNull(bar); <— would be great to warn here, but not necessary(*) - -Because bar corresponds to the same symbol all the time it is not easy to implement the checker that way the cast only suppress the first call but not the second. For this reason in the first implementation after a contradictory cast happens, I will treat bar as nullable unspecified, this way all of the warnings will be suppressed. Treating the symbol as nullable unspecified also has an advantage that in case the takesNonNull function body is being inlined, the will be no warning, when the symbol is dereferenced. In case I have time after the initial version I might spend additional time to try to find a more sophisticated solution, in which we would produce the second warning (*). - -2) nonnull -============ - -- Dereferencing a nonnull, or sending message to it is ok. -- Converting nonnull to nullable is Ok. -- When there is an explicit cast from nonnull to nullable I will trust the cast (it is probable there for a reason, because this cast does not suppress any warnings or errors). -- But what should we do about null checks?:: - - __nonnull id takesNonnull(__nonnull id x) { - if (x == nil) { - // Defensive backward compatible code: - .... - return nil; <- Should the analyzer cover this piece of code? Should we require the cast (__nonnull)nil? - } - .... - } - -There are these directions: -- We can either take the branch; this way the branch is analyzed - - Should we not warn about any nullability issues in that branch? Probably not, it is ok to break the nullability postconditions when the nullability preconditions are violated. -- We can assume that these pointers are not null and we lose coverage with the analyzer. (This can be implemented either in constraint solver or in the checker itself.) - -Other Issues to keep in mind/take care of: -Messaging: -- Sending a message to a nullable pointer - - Even though the method might return a nonnull pointer, when it was sent to a nullable pointer the return type will be nullable. - - The result is nullable unless the receiver is known to be non null. -- Sending a message to a unspecified or nonnull pointer - - If the pointer is not assumed to be nil, we should be optimistic and use the nullability implied by the method. - - This will not happen automatically, since the AST will have null unspecified in this case. - -Inlining -============ - -A symbol may need to be treated differently inside an inlined body. For example, consider these conversions from nonnull to nullable in presence of inlining:: - - id obj = getNonnull(); - takesNullable(obj); - takesNonnull(obj); - - void takesNullable(nullable id obj) { - obj->ivar // we should assume obj is nullable and warn here - } - -With no special treatment, when the takesNullable is inlined the analyzer will not warn when the obj symbol is dereferenced. One solution for this is to reanalyze takesNullable as a top level function to get possible violations. The alternative method, deducing nullability information from the arguments after inlining is not robust enough (for example there might be more parameters with different nullability, but in the given path the two parameters might end up being the same symbol or there can be nested functions that take different view of the nullability of the same symbol). So the symbol will remain nonnull to avoid false positives but the functions that takes nullable parameters will be analyzed separately as well without inlining. - -Annotations on multi level pointers -============ - -Tracking multiple levels of annotations for pointers pointing to pointers would make the checker more complicated, because this way a vector of nullability qualifiers would be needed to be tracked for each symbol. This is not a big caveat, since once the top level pointer is dereferenced, the symvol for the inner pointer will have the nullability information. The lack of multi level annotation tracking only observable, when multiple levels of pointers are passed to a function which has a parameter with multiple levels of annotations. So for now the checker support the top level nullability qualifiers only.:: - - int * __nonnull * __nullable p; - int ** q = p; - takesStarNullableStarNullable(q); - -Implementation notes -============ - -What to track? -- The checker would track memory regions, and to each relevant region a qualifier information would be attached which is either nullable, nonnull or null unspecified (or contradicted to suppress warnings for a specific region). -- On a branch, where a nullable pointer is known to be non null, the checker treat it as a same way as a pointer annotated as nonnull. -- When there is an explicit cast from a null unspecified to either nonnull or nullable I will trust the cast. -- Unannotated pointers are treated the same way as pointers annotated with nullability unspecified qualifier, unless the region is wrapped in ASSUME_NONNULL macros. -- We might want to implement a callback for entry points to top level functions, where the pointer nullability assumptions would be made. Index: docs/conf.py =================================================================== --- docs/conf.py +++ docs/conf.py @@ -65,7 +65,7 @@ # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. -exclude_patterns = ['_build', 'analyzer'] +exclude_patterns = ['_build'] # The reST default role (used for this markup: `text`) to use for all documents. #default_role = None Index: docs/index.rst =================================================================== --- docs/index.rst +++ docs/index.rst @@ -23,6 +23,7 @@ AttributeReference DiagnosticsReference CrossCompilation + ClangStaticAnalyzer ThreadSafetyAnalysis AddressSanitizer ThreadSanitizer