This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
cfe/trunk/
-
trunk/
-
include/clang/StaticAnalyzer/Checkers/
-
clang/
-
StaticAnalyzer/
-
Checkers/
-
Checkers.td
-
lib/StaticAnalyzer/Checkers/
-
StaticAnalyzer/
-
Checkers/
-
CMakeLists.txt
-
StdLibraryFunctionsChecker.cpp
-
test/Analysis/
-
Analysis/
-
std-c-library-functions.c
-
std-c-library-functions.cpp

Differential D20811

[analyzer] Model some library functions
ClosedPublic

Authored by NoQ on May 31 2016, 6:29 AM.

Download Raw Diff

Details

Reviewers

dcoughlin
zaks.anna

Commits

rGbba497fb6502: [analyzer] Add StdLibraryFunctions checker.
rC284960: [analyzer] Add StdLibraryFunctions checker.
rL284960: [analyzer] Add StdLibraryFunctions checker.

Summary

I've put together a simple checker that throws no warnings, but models some library functions, which has already helped us to suppress some false positives in other checkers in our runs.

For pure functions, i chose the old evalCall() approach instead of the body-farm approach because i wanted to produce less state splits. For example, this checker produce a single exploded graph branch for ispunct()'s non-zero branch, when its argument is in range:

['!', '/'] U [':', '@'] U ['[', '\`'] U ['{', '~']

I'm not sure if there's a way to write this out with if's and produce less than 4 branches. (Do we have any plans on merging branches more aggressively during analysis?) Because these functions are pure, we'd hardly ever want to catch them with`evalCall()` again in another checker.

Additionally, this checker's brace-initializers for function specifications are quite short - of course they're limited to very simple cases - the list of these cases can be expanded though.

The checker doesn't seem to be noticeably degrading performance. Here's an example of a false positve squashed:

report-fdc422.html135 KBDownload

Here line is taken to be "", the line++ statement is executed at least once (by looking at the exploded graph; there's lack of "entering loop body" diagnostic piece because loop condition has complicated CFG, which is why it fails to highlight - a separate issue), and the analyzer fails to realize that isspace('\0') is false.

Diff Detail

Repository: rL LLVM

Event Timeline

NoQ updated this revision to Diff 59050.May 31 2016, 6:29 AM

NoQ retitled this revision from to [analyzer] Model some library functions.

NoQ updated this object.

NoQ added reviewers: zaks.anna, dcoughlin.

NoQ added a subscriber: cfe-commits.

Herald added a subscriber: aemerson. · View Herald TranscriptMay 31 2016, 6:29 AM

NoQ updated this object.May 31 2016, 6:35 AM

NoQ updated this object.

NoQ updated this object.May 31 2016, 6:37 AM

xazax.hun added a subscriber: xazax.hun.Jun 1 2016, 6:49 AM

It is great to model more widely used functions! However I think the LibraryFunctionsChecker name might be a bit to broad, wouldn't something like StdCLibraryFunctions be more informative?

Yeah, good point, the "Std" part should definitely appear in the name, not sure about the "C" thing, as we could probably expand this checker to model some simple C++ functions as well (and then we'd make a separate checker section to move from unix. to cplusplus. or something like that, not sure maybe we'd need to reside in core. anyway).

Thanks for the patch! Here are the comments, most of which are nits.

Could you add the high level description of what we are doing somewhere or maybe just describe the meaning of FunctionSpec since it encodes how functions are modeled.

Also, we should explain why we are not using BodyFarm somewhere in the comment.

lib/StaticAnalyzer/Checkers/LibraryFunctionsChecker.cpp
10 ↗	(On Diff #59050)	Please, list the functions.
27 ↗	(On Diff #59050)	Naming looks odd: maybe "OutOfRange" and "WithinRange"?
33 ↗	(On Diff #59050)	nit: "is Unsigned" -> "as Unsigned" Please, use a typedef for the type as you are using it below in getArgType.
83 ↗	(On Diff #59050)	Are the types in FunctionSpec already canonical? If so, please, add a comment.
178 ↗	(On Diff #59050)	nit: If you could factor these out into separate helper functions, it would be easier to read. Lot's of nesting..
219 ↗	(On Diff #59050)	What happens when NewState == State? I guess addTransition would just not do anything, but maybe we should just make the intent explicit and not call it at all.
245 ↗	(On Diff #59050)	Replace "Normal" with a more descriptive name.
273 ↗	(On Diff #59050)	Looks like you might want to have the checking code as a member on FunctionSpec.
322 ↗	(On Diff #59050)	Let's explain what we are doing next. For example, "Let's initialize the FunctionSpec for the functions we are modeling." Remove "NOTE:"
326 ↗	(On Diff #59050)	not clear where this is used or if it is used in the initialization at all.
337 ↗	(On Diff #59050)	Is every item in the range set used to bifurcate the state?

Renamed the checker as xazax.hun suggested, added a lot more comments, done with inline comments :)

zaks.anna added inline comments.Jul 23 2016, 2:55 PM

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
191 ↗	(On Diff #65248)	Do we need to talk about crashes when describing what this does? Also, please, use oxygen throughout.
205 ↗	(On Diff #65248)	We could either provide these APIs in CallEvent or at least have variants that return canonical types. Maybe we already do some of that?
445 ↗	(On Diff #65248)	When can this go wrong? Are we checking if there is a mismatch between the function declaration and the call expression? It is strange that findFunctionSpec takes both of those. Couldn't you always get FunctionDecl out of CallExpr?
508 ↗	(On Diff #65248)	you could also use /NameOfTheField/ convention to name the arguments if that would make this map more readable.
test/Analysis/std-library-functions.c
3 ↗	(On Diff #65248)	Why are you not testing all of the functions?

NoQ updated this revision to Diff 65369.Jul 25 2016, 10:08 AM

NoQ marked 4 inline comments as done.

NoQ added inline comments.

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
191 ↗	(On Diff #65248)	Added more comments below.
205 ↗	(On Diff #65248)	Maybe a separate commit? There are quite a few checkers from which the `.getArgExpr(N)->getType()` pattern could be de-duplicated, but i don't think many of them are interested in canonical types.
445 ↗	(On Diff #65248)	Callee decl is path-sensitive information because functions can be passed around by pointers, as mentioned in the comment at the beginning of the function. Expanded the comment, added a test.
508 ↗	(On Diff #65248)	I think compactness is worth it here, and specs are pretty easy to remember, imho. Added an example to the first spec to see how it looks and make it easier for the reader to adapt and remember, but i'm not quite convinced that verbosity is worth it here.
test/Analysis/std-library-functions.c
3 ↗	(On Diff #65248)	I was too bored to generate tests for all branches of all functions (and if i auto-generate such tests, it defeats the purpose), but i added some of the more creative tests and covered at least some branches of all functions with them.

NoQ added inline comments.Jul 25 2016, 10:56 AM

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
192 ↗	(On Diff #65369)	Even though there are some doxygen-style comments in the checkers, i’ve never seen doxygen actually generate any docs for checker classes. Are they useful for IDE quick-hints only?

Answering myself: Hmm, so the only reason why MPI checker class appears in doxygen (http://clang.llvm.org/doxygen/classclang_1_1ento_1_1mpi_1_1MPIChecker.html) is because this class is not in anonymous namespace (as far as i understand, they needed to be multi-file for some reason). CheckerDocumentation says that every checker must be wrapped in anonymous namespace, except CheckerDocumentationChecker :)

I don’t really see a good reason for the library functions checker to be moved out of anonymous namespace or deserve a doxygen page - after all, it’s all in one file, and the docs are right in front of the reader’s eyes anyway. But maybe if this checker expands enough, we could expose its data structures into public use, and then they'd be worth documenting :)

dcoughlin added inline comments.Jul 27 2016, 3:47 PM

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
509 ↗	(On Diff #65369)	I disagree about compactness being valuable here. I think it is more important to intrinsically document the spec. These will be written once and read frequently. When they are written, they will copied from a previous example -- probably by someone who is not familiar with the code or the spec format. Another possibility (not sure if it is the right one here) is to use macro tricks to define a simple DSL like Kulpreet did in the LocalizationChecker.cpp.

NoQ added inline comments.Jul 28 2016, 4:41 AM

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp

509 ↗

(On Diff #65369)

These will be written once and read frequently.

If only it was so :))

Hmm. What do you think of the following format? Macros mostly expand to empty or (argument), but it should be more readable than the /*...*/ noise.

SPEC {
  FOR_FUNCTION("isalnum"),
  SPEC_DATA {
    ARGUMENT_TYPES { IntTy },
    RETURN_TYPE(IntTy),
    INVALIDATION_APPROACH(EvalCallAsPure),
    BRANCHES {
      BRANCH { // Boils down to isupper() or islower() or isdigit()
        RANGE {
          ARG_NO(0), RANGE_KIND(WithinRange),
          SET { SEG('0', '9') U SEG('A', 'Z') U SEG('a', 'z') }
        },
        RANGE {
          RET_VAL, RANGE_KIND(OutOfRange),
          SET { SEG(0, 0) }
        }
      },
      BRANCH { // The locale-specific branch.
        RANGE {
          ARG_NO(0), RANGE_KIND(WithinRange),
          SET { SEG(128, 255) }
        }
      },
      BRANCH { // Other.
        RANGE {
          ARG_NO(0), RANGE_KIND(OutOfRange),
          SET { SEG('0', '9') U SEG('A', 'Z')
                              U SEG('a', 'z') U SEG(128, 255)}
        },
        RANGE {
          RET_VAL, RANGE_KIND(WithinRange),
          SET { SEG(0, 0) }
        }
      }
    }
  }
},

Even though there are some doxygen-style comments in the checkers, i’ve never seen doxygen actually generate any docs for checker classes.
Are they useful for IDE quick-hints only?

I think it's useful to have consistent documentation format.

In D20811#497585, @NoQ wrote:

Answering myself: Hmm, so the only reason why MPI checker class appears in doxygen (http://clang.llvm.org/doxygen/classclang_1_1ento_1_1mpi_1_1MPIChecker.html) is because this class is not in anonymous namespace (as far as i understand, they needed to be multi-file for some reason). CheckerDocumentation says that every checker must be wrapped in anonymous namespace, except CheckerDocumentationChecker :)

I don’t really see a good reason for the library functions checker to be moved out of anonymous namespace or deserve a doxygen page - after all, it’s all in one file, and the docs are right in front of the reader’s eyes anyway. But maybe if this checker expands enough, we could expose its data structures into public use, and then they'd be worth documenting :)

It has been originally written as a large set of files. If you feel strongly about it, we could merge it into a single file. That makes sense to me. @Alexander_Droste, what do you think?

zaks.anna added inline comments.Jul 29 2016, 5:46 PM

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
206 ↗	(On Diff #65369)	Separate commit is fine. I'd provide both APIs in CallEvent.
test/Analysis/std-library-functions.c
4 ↗	(On Diff #65369)	Ok.

It has been originally written as a large set of files. If you feel strongly about it, we could merge it into a single file. That makes sense to me. @Alexander_Droste, what do you think?

Hi,
I would still strongly prefer to keep them in separate files if possible. One of the headers (MPIFunctionClassifier.hpp)
also got moved to include/clang/StaticAnalyzer/Checkers, as it is needed by some MPI clang-tidy checks.
Is it really a problem if the checker comments are part of the Doxygen documentation? Further, I think that
the separation of concerns in form of distinct files might be valuable for people being new to the Clang Static Analyzer
framework, as the grouping of functionality is visible on a higher level of abstraction. Regardless, I would of course
accept if you prefer to merge the files into a single one, excluding the MPIFunctionClassifier.hpp header.

Is it really a problem if the checker comments are part of the Doxygen documentation?

Of course not :) I've been mostly thinking about the benefits of the anonymous namespace itself (cleaner global scope, no name collisions, but even these benefits are extremely minor).

Added a huge amount of macros in order to improve readability of function specs.
Other inline comments should have been addressed before.

Herald added subscribers: mgorny, beanz. · View Herald TranscriptSep 15 2016, 8:43 AM

Thanks for adding the macros. I've provided some feedback inline.

I think a good rule of thumb for readability is: suppose you are a maintainer and need to add a summary for a new function. Can you copy the the summary for an existing function and figure out what each component means so you can change it for the new function?

include/clang/StaticAnalyzer/Checkers/Checkers.td
419 ↗	(On Diff #71510)	I know you and Gábor already discussed this -- but shouldn't this be CStdLibraryFunctionsChecker or 'StdCLibraryFunctionsChecker'? Or is is your intent that both C and C++ standard libraries would be modeled by this checker?
lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
11 ↗	(On Diff #71510)	"throw" --> "generate"
534 ↗	(On Diff #71510)	Is "specification" the right term here? Or is this really a "summary"?
536 ↗	(On Diff #71510)	'SPEC_DATA' doesn't seem to add much in terms of readability. Is it needed?
537 ↗	(On Diff #71510)	The argument and return types seem like more a property of the function than than the summary. Why are they here and not with the function name?
540 ↗	(On Diff #71510)	"Cases" seems more appropriate than "branches" (branching is an implementation detail).
542 ↗	(On Diff #71510)	If I understand correctly, the first "argument" to branch describes the constraints on the function arguments and the second (if present) describes the resulting constraint on the return value when the argument constraint holds. Is there a way to make this apparent in the spelling of the summary? As a straw proposal, what about renaming the first 'RANGE' to 'ARGUMENT_CONSTRAINT' and the second the 'RETURN_VALUE_CONSTRAINT'? Or, more jargony, "PRECONDITION" and "POSTCONDITION"?
547 ↗	(On Diff #71510)	Is it ever the case that this final 'RANGE" constrains anything other than the return value? If not, can 'RET_VAL' be elided?
554 ↗	(On Diff #71510)	What is the motivation behind the use of geometric terms here (i.e., "SEG", "POINT")? Why not "INTERVAL" and "EXACT_VALUE"?
560 ↗	(On Diff #71510)	Would you be opposed to 'UNION' instead of 'U'?

In D20811#544250, @dcoughlin wrote:

I think a good rule of thumb for readability is: suppose you are a maintainer and need to add a summary for a new function. Can you copy the the summary for an existing function and figure out what each component means so you can change it for the new function?

Seems i've written too many summaries to reliably use this rule :)

Could you have a look at another attempt?:

SUMMARY(isalnum, ARGUMENT_TYPES { IntTy }, RETURN_TYPE(IntTy),
        INVALIDATION_APPROACH(EvalCallAsPure))
  CASE // Boils down to isupper() or islower() or isdigit()
    PRE_CONDITION(ARG_NO(0), CONDITION_KIND(WithinRange))
      RANGE('0', '9')
      RANGE('A', 'Z')
      RANGE('a', 'z')
    END_PRE_CONDITION
    POST_CONDITION(OutOfRange)
      VALUE(0)
    END_POST_CONDITION
  END_CASE
  CASE // The locale-specific range.
    PRE_CONDITION(ARG_NO(0), CONDITION_KIND(WithinRange))
      RANGE(128, 255)
    END_PRE_CONDITION
    // No post-condition. We are completely unaware of
    // locale-specific return values.
  END_CASE
  CASE
    PRE_CONDITION(ARG_NO(0), CONDITION_KIND(OutOfRange))
      RANGE('0', '9')
      RANGE('A', 'Z')
      RANGE('a', 'z')
      RANGE(128, 255)
    END_PRE_CONDITION
    POST_CONDITION(WithinRange)
      VALUE(0)
    END_POST_CONDITION
  END_CASE
END_SUMMARY

include/clang/StaticAnalyzer/Checkers/Checkers.td
419 ↗	(On Diff #71510)	Hmm, i just realized what you guys were talking about :) The same checker cpp file and even the same checker object should probably produce different checker list entries here which would go into separate packages (cplusplus for C++ library functions, etc.). We could even split the specifications into different files, but the checker object would still be the same, defined in the same file. Will do.
lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
537 ↗	(On Diff #71510)	Because this is where C++ initializer list syntax forces them to be. Hiding this detail is, as far as i see, only possible with the means of BEGIN_.../END_... macros (which isn't a big deal i think).
547 ↗	(On Diff #71510)	Some summaries only have pre-conditions: "for this argument constraint, any return value is possible". We should also be able to support void functions, which have no return values.

I think this is much clearer! That said, now that I look at it with 'POSTCONDITION' alone I don't think it is clear that the provided value describes the return value. What do you think about renaming it 'RETURN_VALUE'? Or adding back the RET_VAL I asked you about removing before? :-)

Also: do you think CONDITION_KIND is needed? in PRECONDITION? Or can the bare kind be used like in POSTCONDITION?

In D20811#544927, @dcoughlin wrote:

That said, now that I look at it with 'POSTCONDITION' alone I don't think it is clear that the provided value describes the return value. What do you think about renaming it 'RETURN_VALUE'? Or adding back the RET_VAL I asked you about removing before? :-)

Hmm, what about

CONSTRAIN
  ARGUMENT_VALUE(0, WithinRange)
    RANGE('0', '9')
    RANGE('A', 'Z')
    RANGE('a', 'z')
  END_ARGUMENT_VALUE
  RETURN_VALUE(OutOfRange)
    VALUE(0)
  END_RETURN_VALUE
END_CONSTRAIN

Something i don't like here is that the word "value" is overloaded. Maybe rename the inner VALUE back to POINT?

In D20811#544927, @dcoughlin wrote:

Also: do you think CONDITION_KIND is needed? in PRECONDITION? Or can the bare kind be used like in POSTCONDITION?

I agree that it's ok to use the bare kind, because it's quite self-explanatory.

In D20811#544981, @NoQ wrote:

Hmm, what about

CONSTRAIN
  ARGUMENT_VALUE(0, WithinRange)
    RANGE('0', '9')
    RANGE('A', 'Z')
    RANGE('a', 'z')
  END_ARGUMENT_VALUE
  RETURN_VALUE(OutOfRange)
    VALUE(0)
  END_RETURN_VALUE
END_CONSTRAIN

"CONSTRAIN" is a verb. What is the direct object here? It seems to me that the thing being constrained is the return value, so it seems odd to have 'CONSTRAIN' around the conditions on the arguments.

Something i don't like here is that the word "value" is overloaded. Maybe rename the inner VALUE back to POINT?

I don't think the geometric metaphor of 'POINT' makes sense here, especially with 'RANGE' (which I think is very good). What is the analog of a range that has only a single element?

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
547 ↗	(On Diff #71510)	What does a postcondition on a void function mean in this context? Can you refer to argument values? Such as "If the the function terminates then it must have been the case that the first argument was in the rangy x..z even though we didn't know that going in? Is this useful?

Ping? Is there something blocking progress here? This functionality is very useful and almost done.

Thanks!

zaks.anna accepted this revision.Oct 12 2016, 9:55 PM

zaks.anna edited edge metadata.

This revision is now accepted and ready to land.Oct 12 2016, 9:55 PM

NoQ mentioned this in D25660: [Analyzer] Checker for iterators dereferenced beyond their range..Oct 18 2016, 1:57 AM

I thought to give it a pause to take a fresh look at how to arrange the macro-hints in the summaries.

Maybe something like that:

CASE
  ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
    RANGE('0', '9')
    RANGE('A', 'Z')
    RANGE('a', 'z')
    RANGE(128, 255)
  END_ARGUMENT_CONDITION
  RETURN_VALUE_CONDITION(WithinRange)
    SINGLE_VALUE(0)
  END_RETURN_VALUE_CONDITION
END_CASE

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
547 ↗	(On Diff #71510)	No, i don't think this is useful. There are just timeless immutable symbols about which we learn something new on every branch. If the function doesn't terminate on certain pre-conditons, then we can model it by never mentioning these pre-conditions in any of the branches (we don't use this trick anywhere yet - all functions listed here shall terminate in all cases). This would have been useful if we start referring to the heap shape (eg. "if the value behind the pointer passed as second argument to the call was in range [1,10] before the call, then it would be equal to 42 after the call"), but we don't do that yet.

In D20811#575521, @NoQ wrote:
I thought to give it a pause to take a fresh look at how to arrange the macro-hints in the summaries.

Maybe something like that:
CASE
  ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
    RANGE('0', '9')
    RANGE('A', 'Z')
    RANGE('a', 'z')
    RANGE(128, 255)
  END_ARGUMENT_CONDITION
  RETURN_VALUE_CONDITION(WithinRange)
    SINGLE_VALUE(0)
  END_RETURN_VALUE_CONDITION
END_CASE

Looks great to me!

Update the domain-specific language for function specs/summaries.

Herald added a subscriber: modocache. · View Herald TranscriptOct 21 2016, 10:12 AM

This looks great!

lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
694 ↗	(On Diff #75446)	I think think this comment should say 'lowercase'.
702 ↗	(On Diff #75446)	Same here.

Closed by commit rL284960: [analyzer] Add StdLibraryFunctions checker. (authored by dergachev). · Explain WhyOct 24 2016, 2:51 AM

This revision was automatically updated to reflect the committed changes.

NoQ mentioned this in D27918: [analyzer] OStreamChecker.Apr 4 2017, 7:50 AM

NoQ mentioned this in D69662: [Checkers] Avoid using evalCall in StreamChecker..Nov 4 2019, 10:46 AM

Revision Contents

Path

Size

cfe/

trunk/

include/

clang/

StaticAnalyzer/

Checkers/

Checkers.td

4 lines

lib/

StaticAnalyzer/

Checkers/

CMakeLists.txt

1 line

StdLibraryFunctionsChecker.cpp

943 lines

test/

Analysis/

std-c-library-functions.c

184 lines

std-c-library-functions.cpp

14 lines

Diff 75560

cfe/trunk/include/clang/StaticAnalyzer/Checkers/Checkers.td

	Show First 20 Lines • Show All 410 Lines • ▼ Show 20 Lines
	def MismatchedDeallocatorChecker : Checker<"MismatchedDeallocator">,			def MismatchedDeallocatorChecker : Checker<"MismatchedDeallocator">,
	HelpText<"Check for mismatched deallocators.">,			HelpText<"Check for mismatched deallocators.">,
	DescFile<"MallocChecker.cpp">;			DescFile<"MallocChecker.cpp">;

	def VforkChecker : Checker<"Vfork">,			def VforkChecker : Checker<"Vfork">,
	HelpText<"Check for proper usage of vfork">,			HelpText<"Check for proper usage of vfork">,
	DescFile<"VforkChecker.cpp">;			DescFile<"VforkChecker.cpp">;

				def StdCLibraryFunctionsChecker : Checker<"StdCLibraryFunctions">,
				HelpText<"Improve modeling of the C standard library functions">,
				DescFile<"StdLibraryFunctionsChecker.cpp">;

	} // end "unix"			} // end "unix"

	let ParentPackage = UnixAlpha in {			let ParentPackage = UnixAlpha in {

	def ChrootChecker : Checker<"Chroot">,			def ChrootChecker : Checker<"Chroot">,
	HelpText<"Check improper use of chroot">,			HelpText<"Check improper use of chroot">,
	DescFile<"ChrootChecker.cpp">;			DescFile<"ChrootChecker.cpp">;

	▲ Show 20 Lines • Show All 294 Lines • Show Last 20 Lines

cfe/trunk/lib/StaticAnalyzer/Checkers/CMakeLists.txt

Show First 20 Lines • Show All 65 Lines • ▼ Show 20 Lines	add_clang_library(clangStaticAnalyzerCheckers
PointerArithChecker.cpp		PointerArithChecker.cpp
PointerSubChecker.cpp		PointerSubChecker.cpp
PthreadLockChecker.cpp		PthreadLockChecker.cpp
RetainCountChecker.cpp		RetainCountChecker.cpp
ReturnPointerRangeChecker.cpp		ReturnPointerRangeChecker.cpp
ReturnUndefChecker.cpp		ReturnUndefChecker.cpp
SimpleStreamChecker.cpp		SimpleStreamChecker.cpp
StackAddrEscapeChecker.cpp		StackAddrEscapeChecker.cpp
		StdLibraryFunctionsChecker.cpp
StreamChecker.cpp		StreamChecker.cpp
TaintTesterChecker.cpp		TaintTesterChecker.cpp
TestAfterDivZeroChecker.cpp		TestAfterDivZeroChecker.cpp
TraversalChecker.cpp		TraversalChecker.cpp
UndefBranchChecker.cpp		UndefBranchChecker.cpp
UndefCapturedBlockVarChecker.cpp		UndefCapturedBlockVarChecker.cpp
UndefResultChecker.cpp		UndefResultChecker.cpp
UndefinedArraySubscriptChecker.cpp		UndefinedArraySubscriptChecker.cpp
Show All 19 Lines

cfe/trunk/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp

				//=== StdLibraryFunctionsChecker.cpp - Model standard functions -- C++ --===//
				//
				// The LLVM Compiler Infrastructure
				//
				// This file is distributed under the University of Illinois Open Source
				// License. See LICENSE.TXT for details.
				//
				//===----------------------------------------------------------------------===//
				//
				// This checker improves modeling of a few simple library functions.
				// It does not generate warnings.
				//
				// This checker provides a specification format - `FunctionSummaryTy' - and
				// contains descriptions of some library functions in this format. Each
				// specification contains a list of branches for splitting the program state
				// upon call, and range constraints on argument and return-value symbols that
				// are satisfied on each branch. This spec can be expanded to include more
				// items, like external effects of the function.
				//
				// The main difference between this approach and the body farms technique is
				// in more explicit control over how many branches are produced. For example,
				// consider standard C function `ispunct(int x)', which returns a non-zero value
				// iff `x' is a punctuation character, that is, when `x' is in range
				// ['!', '/'] [':', '@'] U ['[', '\`'] U ['{', '~'].
				// `FunctionSummaryTy' provides only two branches for this function. However,
				// any attempt to describe this range with if-statements in the body farm
				// would result in many more branches. Because each branch needs to be analyzed
				// independently, this significantly reduces performance. Additionally,
				// once we consider a branch on which `x' is in range, say, ['!', '/'],
				// we assume that such branch is an important separate path through the program,
				// which may lead to false positives because considering this particular path
				// was not consciously intended, and therefore it might have been unreachable.
				//
				// This checker uses eval::Call for modeling "pure" functions, for which
				// their `FunctionSummaryTy' is a precise model. This avoids unnecessary
				// invalidation passes. Conflicts with other checkers are unlikely because
				// if the function has no other effects, other checkers would probably never
				// want to improve upon the modeling done by this checker.
				//
				// Non-"pure" functions, for which only partial improvement over the default
				// behavior is expected, are modeled via check::PostCall, non-intrusively.
				//
				// The following standard C functions are currently supported:
				//
				// fgetc getline isdigit isupper
				// fread isalnum isgraph isxdigit
				// fwrite isalpha islower read
				// getc isascii isprint write
				// getchar isblank ispunct
				// getdelim iscntrl isspace
				//
				//===----------------------------------------------------------------------===//

				#include "ClangSACheckers.h"
				#include "clang/StaticAnalyzer/Core/Checker.h"
				#include "clang/StaticAnalyzer/Core/CheckerManager.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"

				using namespace clang;
				using namespace clang::ento;

				namespace {
				class StdLibraryFunctionsChecker : public Checker<check::PostCall, eval::Call> {
				/// Below is a series of typedefs necessary to define function specs.
				/// We avoid nesting types here because each additional qualifier
				/// would need to be repeated in every function spec.
				struct FunctionSummaryTy;

				/// Specify how much the analyzer engine should entrust modeling this function
				/// to us. If he doesn't, he performs additional invalidations.
				enum InvalidationKindTy { NoEvalCall, EvalCallAsPure };

				/// A pair of ValueRangeKindTy and IntRangeVectorTy would describe a range
				/// imposed on a particular argument or return value symbol.
				///
				/// Given a range, should the argument stay inside or outside this range?
				/// The special `ComparesToArgument' value indicates that we should
				/// impose a constraint that involves other argument or return value symbols.
				enum ValueRangeKindTy { OutOfRange, WithinRange, ComparesToArgument };

				/// Normally, describes a single range constraint, eg. {{0, 1}, {3, 4}} is
				/// a non-negative integer, which less than 5 and not equal to 2. For
				/// `ComparesToArgument', holds information about how exactly to compare to
				/// the argument.
				typedef std::vector<std::pair<int64_t, int64_t>> IntRangeVectorTy;

				/// A reference to an argument or return value by its number.
				/// ArgNo in CallExpr and CallEvent is defined as Unsigned, but
				/// obviously uint32_t should be enough for all practical purposes.
				typedef uint32_t ArgNoTy;
				static const ArgNoTy Ret = std::numeric_limits<ArgNoTy>::max();

				/// Incapsulates a single range on a single symbol within a branch.
				class ValueRange {
				ArgNoTy ArgNo; // Argument to which we apply the range.
				ValueRangeKindTy Kind; // Kind of range definition.
				IntRangeVectorTy Args; // Polymorphic arguments.

				public:
				ValueRange(ArgNoTy ArgNo, ValueRangeKindTy Kind,
				const IntRangeVectorTy &Args)
				: ArgNo(ArgNo), Kind(Kind), Args(Args) {}

				ArgNoTy getArgNo() const { return ArgNo; }
				ValueRangeKindTy getKind() const { return Kind; }

				BinaryOperator::Opcode getOpcode() const {
				assert(Kind == ComparesToArgument);
				assert(Args.size() == 1);
				BinaryOperator::Opcode Op =
				static_cast<BinaryOperator::Opcode>(Args[0].first);
				assert(BinaryOperator::isComparisonOp(Op) &&
				"Only comparison ops are supported for ComparesToArgument");
				return Op;
				}

				ArgNoTy getOtherArgNo() const {
				assert(Kind == ComparesToArgument);
				assert(Args.size() == 1);
				return static_cast<ArgNoTy>(Args[0].second);
				}

				const IntRangeVectorTy &getRanges() const {
				assert(Kind != ComparesToArgument);
				return Args;
				}

				// We avoid creating a virtual apply() method because
				// it makes initializer lists harder to write.
				private:
				ProgramStateRef
				applyAsOutOfRange(ProgramStateRef State, const CallEvent &Call,
				const FunctionSummaryTy &Summary) const;
				ProgramStateRef
				applyAsWithinRange(ProgramStateRef State, const CallEvent &Call,
				const FunctionSummaryTy &Summary) const;
				ProgramStateRef
				applyAsComparesToArgument(ProgramStateRef State, const CallEvent &Call,
				const FunctionSummaryTy &Summary) const;

				public:
				ProgramStateRef apply(ProgramStateRef State, const CallEvent &Call,
				const FunctionSummaryTy &Summary) const {
				switch (Kind) {
				case OutOfRange:
				return applyAsOutOfRange(State, Call, Summary);
				case WithinRange:
				return applyAsWithinRange(State, Call, Summary);
				case ComparesToArgument:
				return applyAsComparesToArgument(State, Call, Summary);
				}
				llvm_unreachable("Unknown ValueRange kind!");
				}
				};

				/// The complete list of ranges that defines a single branch.
				typedef std::vector<ValueRange> ValueRangeSet;

				/// Includes information about function prototype (which is necessary to
				/// ensure we're modeling the right function and casting values properly),
				/// approach to invalidation, and a list of branches - essentially, a list
				/// of list of ranges - essentially, a list of lists of lists of segments.
				struct FunctionSummaryTy {
				const std::vector<QualType> ArgTypes;
				const QualType RetType;
				const InvalidationKindTy InvalidationKind;
				const std::vector<ValueRangeSet> Ranges;

				private:
				static void assertTypeSuitableForSummary(QualType T) {
				assert(!T->isVoidType() &&
				"We should have had no significant void types in the spec");
				assert(T.isCanonical() &&
				"We should only have canonical types in the spec");
				// FIXME: lift this assert (but not the ones above!)
				assert(T->isIntegralOrEnumerationType() &&
				"We only support integral ranges in the spec");
				}

				public:
				QualType getArgType(ArgNoTy ArgNo) const {
				QualType T = (ArgNo == Ret) ? RetType : ArgTypes[ArgNo];
				assertTypeSuitableForSummary(T);
				return T;
				}

				/// Try our best to figure out if the call expression is the call of
				/// the library function to which this specification applies.
				bool matchesCall(const CallExpr *CE) const;
				};

				// The map of all functions supported by the checker. It is initialized
				// lazily, and it doesn't change after initialization.
				typedef llvm::StringMap<FunctionSummaryTy> FunctionSummaryMapTy;
				mutable FunctionSummaryMapTy FunctionSummaryMap;

				// Auxiliary functions to support ArgNoTy within all structures
				// in a unified manner.
				static QualType getArgType(const FunctionSummaryTy &Summary, ArgNoTy ArgNo) {
				return Summary.getArgType(ArgNo);
				}
				static QualType getArgType(const CallEvent &Call, ArgNoTy ArgNo) {
				return ArgNo == Ret ? Call.getResultType().getCanonicalType()
				: Call.getArgExpr(ArgNo)->getType().getCanonicalType();
				}
				static QualType getArgType(const CallExpr *CE, ArgNoTy ArgNo) {
				return ArgNo == Ret ? CE->getType().getCanonicalType()
				: CE->getArg(ArgNo)->getType().getCanonicalType();
				}
				static SVal getArgSVal(const CallEvent &Call, ArgNoTy ArgNo) {
				return ArgNo == Ret ? Call.getReturnValue() : Call.getArgSVal(ArgNo);
				}

				public:
				void checkPostCall(const CallEvent &Call, CheckerContext &C) const;
				bool evalCall(const CallExpr *CE, CheckerContext &C) const;

				private:
				Optional<FunctionSummaryTy> findFunctionSummary(const FunctionDecl *FD,
				const CallExpr *CE,
				CheckerContext &C) const;

				void initFunctionSummaries(BasicValueFactory &BVF) const;
				};
				} // end of anonymous namespace

				ProgramStateRef StdLibraryFunctionsChecker::ValueRange::applyAsOutOfRange(
				ProgramStateRef State, const CallEvent &Call,
				const FunctionSummaryTy &Summary) const {

				ProgramStateManager &Mgr = State->getStateManager();
				SValBuilder &SVB = Mgr.getSValBuilder();
				BasicValueFactory &BVF = SVB.getBasicValueFactory();
				ConstraintManager &CM = Mgr.getConstraintManager();
				QualType T = getArgType(Summary, getArgNo());
				SVal V = getArgSVal(Call, getArgNo());

				if (auto N = V.getAs<NonLoc>()) {
				const IntRangeVectorTy &R = getRanges();
				size_t E = R.size();
				for (size_t I = 0; I != E; ++I) {
				const llvm::APSInt &Min = BVF.getValue(R[I].first, T);
				const llvm::APSInt &Max = BVF.getValue(R[I].second, T);
				assert(Min <= Max);
				State = CM.assumeWithinInclusiveRange(State, *N, Min, Max, false);
				if (!State)
				break;
				}
				}

				return State;
				}

				ProgramStateRef
				StdLibraryFunctionsChecker::ValueRange::applyAsWithinRange(
				ProgramStateRef State, const CallEvent &Call,
				const FunctionSummaryTy &Summary) const {

				ProgramStateManager &Mgr = State->getStateManager();
				SValBuilder &SVB = Mgr.getSValBuilder();
				BasicValueFactory &BVF = SVB.getBasicValueFactory();
				ConstraintManager &CM = Mgr.getConstraintManager();
				QualType T = getArgType(Summary, getArgNo());
				SVal V = getArgSVal(Call, getArgNo());

				// "WithinRange R" is treated as "outside [T_MIN, T_MAX] \ R".
				// We cut off [T_MIN, min(R) - 1] and [max(R) + 1, T_MAX] if necessary,
				// and then cut away all holes in R one by one.
				if (auto N = V.getAs<NonLoc>()) {
				const IntRangeVectorTy &R = getRanges();
				size_t E = R.size();

				const llvm::APSInt &MinusInf = BVF.getMinValue(T);
				const llvm::APSInt &PlusInf = BVF.getMaxValue(T);

				const llvm::APSInt &Left = BVF.getValue(R[0].first - 1, T);
				if (Left != PlusInf) {
				assert(MinusInf <= Left);
				State = CM.assumeWithinInclusiveRange(State, *N, MinusInf, Left, false);
				if (!State)
				return nullptr;
				}

				const llvm::APSInt &Right = BVF.getValue(R[E - 1].second + 1, T);
				if (Right != MinusInf) {
				assert(Right <= PlusInf);
				State = CM.assumeWithinInclusiveRange(State, *N, Right, PlusInf, false);
				if (!State)
				return nullptr;
				}

				for (size_t I = 1; I != E; ++I) {
				const llvm::APSInt &Min = BVF.getValue(R[I - 1].second + 1, T);
				const llvm::APSInt &Max = BVF.getValue(R[I].first - 1, T);
				assert(Min <= Max);
				State = CM.assumeWithinInclusiveRange(State, *N, Min, Max, false);
				if (!State)
				return nullptr;
				}
				}

				return State;
				}

				ProgramStateRef
				StdLibraryFunctionsChecker::ValueRange::applyAsComparesToArgument(
				ProgramStateRef State, const CallEvent &Call,
				const FunctionSummaryTy &Summary) const {

				ProgramStateManager &Mgr = State->getStateManager();
				SValBuilder &SVB = Mgr.getSValBuilder();
				QualType CondT = SVB.getConditionType();
				QualType T = getArgType(Summary, getArgNo());
				SVal V = getArgSVal(Call, getArgNo());

				BinaryOperator::Opcode Op = getOpcode();
				ArgNoTy OtherArg = getOtherArgNo();
				SVal OtherV = getArgSVal(Call, OtherArg);
				QualType OtherT = getArgType(Call, OtherArg);
				// Note: we avoid integral promotion for comparison.
				OtherV = SVB.evalCast(OtherV, T, OtherT);
				if (auto CompV = SVB.evalBinOp(State, Op, V, OtherV, CondT)
				.getAs<DefinedOrUnknownSVal>())
				State = State->assume(*CompV, true);
				return State;
				}

				void StdLibraryFunctionsChecker::checkPostCall(const CallEvent &Call,
				CheckerContext &C) const {
				const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(Call.getDecl());
				if (!FD)
				return;

				const CallExpr *CE = dyn_cast_or_null<CallExpr>(Call.getOriginExpr());
				if (!CE)
				return;

				Optional<FunctionSummaryTy> FoundSummary = findFunctionSummary(FD, CE, C);
				if (!FoundSummary)
				return;

				// Now apply ranges.
				const FunctionSummaryTy &Summary = *FoundSummary;
				ProgramStateRef State = C.getState();

				for (const auto &VRS: Summary.Ranges) {
				ProgramStateRef NewState = State;
				for (const auto &VR: VRS) {
				NewState = VR.apply(NewState, Call, Summary);
				if (!NewState)
				break;
				}

				if (NewState && NewState != State)
				C.addTransition(NewState);
				}
				}

				bool StdLibraryFunctionsChecker::evalCall(const CallExpr *CE,
				CheckerContext &C) const {
				const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(CE->getCalleeDecl());
				if (!FD)
				return false;

				Optional<FunctionSummaryTy> FoundSummary = findFunctionSummary(FD, CE, C);
				if (!FoundSummary)
				return false;

				const FunctionSummaryTy &Summary = *FoundSummary;
				switch (Summary.InvalidationKind) {
				case EvalCallAsPure: {
				ProgramStateRef State = C.getState();
				const LocationContext *LC = C.getLocationContext();
				SVal V = C.getSValBuilder().conjureSymbolVal(
				CE, LC, CE->getType().getCanonicalType(), C.blockCount());
				State = State->BindExpr(CE, LC, V);
				C.addTransition(State);
				return true;
				}
				case NoEvalCall:
				// Summary tells us to avoid performing eval::Call. The function is possibly
				// evaluated by another checker, or evaluated conservatively.
				return false;
				}
				llvm_unreachable("Unknown invalidation kind!");
				}

				bool StdLibraryFunctionsChecker::FunctionSummaryTy::matchesCall(
				const CallExpr *CE) const {
				// Check number of arguments:
				if (CE->getNumArgs() != ArgTypes.size())
				return false;

				// Check return type if relevant:
				if (!RetType.isNull() && RetType != CE->getType().getCanonicalType())
				return false;

				// Check argument types when relevant:
				for (size_t I = 0, E = ArgTypes.size(); I != E; ++I) {
				QualType FormalT = ArgTypes[I];
				// Null type marks irrelevant arguments.
				if (FormalT.isNull())
				continue;

				assertTypeSuitableForSummary(FormalT);

				QualType ActualT = StdLibraryFunctionsChecker::getArgType(CE, I);
				assert(ActualT.isCanonical());
				if (ActualT != FormalT)
				return false;
				}

				return true;
				}

				Optional<StdLibraryFunctionsChecker::FunctionSummaryTy>
				StdLibraryFunctionsChecker::findFunctionSummary(const FunctionDecl *FD,
				const CallExpr *CE,
				CheckerContext &C) const {
				// Note: we cannot always obtain FD from CE
				// (eg. virtual call, or call by pointer).
				assert(CE);

				if (!FD)
				return None;

				SValBuilder &SVB = C.getSValBuilder();
				BasicValueFactory &BVF = SVB.getBasicValueFactory();
				initFunctionSummaries(BVF);

				std::string Name = FD->getQualifiedNameAsString();
				if (Name.empty() \|\| !C.isCLibraryFunction(FD, Name))
				return None;

				auto FSMI = FunctionSummaryMap.find(Name);
				if (FSMI == FunctionSummaryMap.end())
				return None;

				// Verify that function signature matches the spec in advance.
				// Otherwise we might be modeling the wrong function.
				// Strict checking is important because we will be conducting
				// very integral-type-sensitive operations on arguments and
				// return values.
				const FunctionSummaryTy &Spec = FSMI->second;
				if (!Spec.matchesCall(CE))
				return None;

				return Spec;
				}

				void StdLibraryFunctionsChecker::initFunctionSummaries(
				BasicValueFactory &BVF) const {
				if (!FunctionSummaryMap.empty())
				return;

				ASTContext &ACtx = BVF.getContext();

				// These types are useful for writing specifications quickly,
				// New specifications should probably introduce more types.
				QualType Irrelevant; // A placeholder, whenever we do not care about the type.
				QualType IntTy = ACtx.IntTy;
				QualType SizeTy = ACtx.getSizeType();
				QualType SSizeTy = ACtx.getIntTypeForBitwidth(ACtx.getTypeSize(SizeTy), true);

				// Don't worry about truncation here, it'd be cast back to SIZE_MAX when used.
				LLVM_ATTRIBUTE_UNUSED int64_t SizeMax =
				BVF.getMaxValue(SizeTy).getLimitedValue();
				int64_t SSizeMax =
				BVF.getMaxValue(SSizeTy).getLimitedValue();

				// We are finally ready to define specifications for all supported functions.
				//
				// The signature needs to have the correct number of arguments.
				// However, we insert `Irrelevant' when the type is insignificant.
				//
				// Argument ranges should always cover all variants. If return value
				// is completely unknown, omit it from the respective range set.
				//
				// All types in the spec need to be canonical.
				//
				// Every item in the list of range sets represents a particular
				// execution path the analyzer would need to explore once
				// the call is modeled - a new program state is constructed
				// for every range set, and each range line in the range set
				// corresponds to a specific constraint within this state.
				//
				// Upon comparing to another argument, the other argument is casted
				// to the current argument's type. This avoids proper promotion but
				// seems useful. For example, read() receives size_t argument,
				// and its return value, which is of type ssize_t, cannot be greater
				// than this argument. If we made a promotion, and the size argument
				// is equal to, say, 10, then we'd impose a range of [0, 10] on the
				// return value, however the correct range is [-1, 10].
				//
				// Please update the list of functions in the header after editing!
				//
				// The format is as follows:
				//
				//{ "function name",
				// { spec:
				// { argument types list, ... },
				// return type, purity, { range set list:
				// { range list:
				// { argument index, within or out of, {{from, to}, ...} },
				// { argument index, compares to argument, {{how, which}} },
				// ...
				// }
				// }
				// }
				//}

				#define SUMMARY(identifier, argument_types, return_type, \
				invalidation_approach) \
				{#identifier, {argument_types, return_type, invalidation_approach, {
				#define END_SUMMARY }}},
				#define ARGUMENT_TYPES(...) { __VA_ARGS__ }
				#define RETURN_TYPE(x) x
				#define INVALIDATION_APPROACH(x) x
				#define CASE {
				#define END_CASE },
				#define ARGUMENT_CONDITION(argument_number, condition_kind) \
				{argument_number, condition_kind, {
				#define END_ARGUMENT_CONDITION }},
				#define RETURN_VALUE_CONDITION(condition_kind) \
				{ Ret, condition_kind, {
				#define END_RETURN_VALUE_CONDITION }},
				#define ARG_NO(x) x##U
				#define RANGE(x, y) { x, y },
				#define SINGLE_VALUE(x) RANGE(x, x)
				#define IS_LESS_THAN(arg) { BO_LE, arg }

				FunctionSummaryMap = {
				// The isascii() family of functions.
				SUMMARY(isalnum, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE // Boils down to isupper() or islower() or isdigit()
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE('0', '9')
				RANGE('A', 'Z')
				RANGE('a', 'z')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE // The locale-specific range.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(128, 255)
				END_ARGUMENT_CONDITION
				// No post-condition. We are completely unaware of
				// locale-specific return values.
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE('0', '9')
				RANGE('A', 'Z')
				RANGE('a', 'z')
				RANGE(128, 255)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(isalpha, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE // isupper() or islower(). Note that 'Z' is less than 'a'.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE('A', 'Z')
				RANGE('a', 'z')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE // The locale-specific range.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(128, 255)
				END_ARGUMENT_CONDITION
				END_CASE
				CASE // Other.
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE('A', 'Z')
				RANGE('a', 'z')
				RANGE(128, 255)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(isascii, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE // Is ASCII.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(0, 127)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE(0, 127)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(isblank, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				SINGLE_VALUE('\t')
				SINGLE_VALUE(' ')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				SINGLE_VALUE('\t')
				SINGLE_VALUE(' ')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(iscntrl, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE // 0..31 or 127
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(0, 32)
				SINGLE_VALUE(127)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE(0, 32)
				SINGLE_VALUE(127)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(isdigit, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE // Is a digit.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE('0', '9')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE('0', '9')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(isgraph, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(33, 126)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE(33, 126)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(islower, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE // Is certainly lowercase.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE('a', 'z')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE // Is ascii but not lowercase.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(0, 127)
				END_ARGUMENT_CONDITION
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE('a', 'z')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE // The locale-specific range.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(128, 255)
				END_ARGUMENT_CONDITION
				END_CASE
				CASE // Is not an unsigned char.
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE(0, 255)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(isprint, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(32, 126)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE(32, 126)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(ispunct, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE('!', '/')
				RANGE(':', '@')
				RANGE('[', '`')
				RANGE('{', '~')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE('!', '/')
				RANGE(':', '@')
				RANGE('[', '`')
				RANGE('{', '~')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(isspace, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE // Space, '\f', '\n', '\r', '\t', '\v'.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(9, 13)
				SINGLE_VALUE(' ')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE // The locale-specific range.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(128, 255)
				END_ARGUMENT_CONDITION
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE(9, 13)
				SINGLE_VALUE(' ')
				RANGE(128, 255)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(isupper, ARGUMENT_TYPES(IntTy), RETURN_TYPE (IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE // Is certainly uppercase.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE('A', 'Z')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE // The locale-specific range.
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE(128, 255)
				END_ARGUMENT_CONDITION
				END_CASE
				CASE // Other.
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE('A', 'Z') RANGE(128, 255)
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(isxdigit, ARGUMENT_TYPES(IntTy), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(EvalCallAsPure))
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), WithinRange)
				RANGE('0', '9')
				RANGE('A', 'F')
				RANGE('a', 'f')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(OutOfRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				CASE
				ARGUMENT_CONDITION(ARG_NO(0), OutOfRange)
				RANGE('0', '9')
				RANGE('A', 'F')
				RANGE('a', 'f')
				END_ARGUMENT_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(0)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY

				// The getc() family of functions that returns either a char or an EOF.
				SUMMARY(getc, ARGUMENT_TYPES(Irrelevant), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(NoEvalCall))
				CASE // FIXME: EOF is assumed to be defined as -1.
				RETURN_VALUE_CONDITION(WithinRange)
				RANGE(-1, 255)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(fgetc, ARGUMENT_TYPES(Irrelevant), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(NoEvalCall))
				CASE // FIXME: EOF is assumed to be defined as -1.
				RETURN_VALUE_CONDITION(WithinRange)
				RANGE(-1, 255)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(getchar, ARGUMENT_TYPES(), RETURN_TYPE(IntTy),
				INVALIDATION_APPROACH(NoEvalCall))
				CASE // FIXME: EOF is assumed to be defined as -1.
				RETURN_VALUE_CONDITION(WithinRange)
				RANGE(-1, 255)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY

				// read()-like functions that never return more than buffer size.
				SUMMARY(read, ARGUMENT_TYPES(Irrelevant, Irrelevant, SizeTy),
				RETURN_TYPE(SSizeTy), INVALIDATION_APPROACH(NoEvalCall))
				CASE
				RETURN_VALUE_CONDITION(ComparesToArgument)
				IS_LESS_THAN(ARG_NO(2))
				END_RETURN_VALUE_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				RANGE(-1, SSizeMax)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(write, ARGUMENT_TYPES(Irrelevant, Irrelevant, SizeTy),
				RETURN_TYPE(SSizeTy), INVALIDATION_APPROACH(NoEvalCall))
				CASE
				RETURN_VALUE_CONDITION(ComparesToArgument)
				IS_LESS_THAN(ARG_NO(2))
				END_RETURN_VALUE_CONDITION
				RETURN_VALUE_CONDITION(WithinRange)
				RANGE(-1, SSizeMax)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(fread,
				ARGUMENT_TYPES(Irrelevant, Irrelevant, SizeTy, Irrelevant),
				RETURN_TYPE(SizeTy), INVALIDATION_APPROACH(NoEvalCall))
				CASE
				RETURN_VALUE_CONDITION(ComparesToArgument)
				IS_LESS_THAN(ARG_NO(2))
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(fwrite,
				ARGUMENT_TYPES(Irrelevant, Irrelevant, SizeTy, Irrelevant),
				RETURN_TYPE(SizeTy), INVALIDATION_APPROACH(NoEvalCall))
				CASE
				RETURN_VALUE_CONDITION(ComparesToArgument)
				IS_LESS_THAN(ARG_NO(2))
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY

				// getline()-like functions either fail or read at least the delimiter.
				SUMMARY(getline, ARGUMENT_TYPES(Irrelevant, Irrelevant, Irrelevant),
				RETURN_TYPE(SSizeTy), INVALIDATION_APPROACH(NoEvalCall))
				CASE
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(-1)
				RANGE(1, SSizeMax)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				SUMMARY(getdelim,
				ARGUMENT_TYPES(Irrelevant, Irrelevant, Irrelevant, Irrelevant),
				RETURN_TYPE(SSizeTy), INVALIDATION_APPROACH(NoEvalCall))
				CASE
				RETURN_VALUE_CONDITION(WithinRange)
				SINGLE_VALUE(-1)
				RANGE(1, SSizeMax)
				END_RETURN_VALUE_CONDITION
				END_CASE
				END_SUMMARY
				};
				}

				void ento::registerStdCLibraryFunctionsChecker(CheckerManager &mgr) {
				// If this checker grows large enough to support C++, Objective-C, or other
				// standard libraries, we could use multiple register...Checker() functions,
				// which would register various checkers with the help of the same Checker
				// class, turning on different function summaries.
				mgr.registerChecker<StdLibraryFunctionsChecker>();
				}

cfe/trunk/test/Analysis/std-c-library-functions.c

				// RUN: %clang_cc1 -analyze -analyzer-checker=unix.StdCLibraryFunctions,debug.ExprInspection -verify %s

				void clang_analyzer_eval(int);

				int glob;

				typedef struct FILE FILE;
				#define EOF -1

				int getc(FILE *);
				void test_getc(FILE *fp) {
				int x;
				while ((x = getc(fp)) != EOF) {
				clang_analyzer_eval(x > 255); // expected-warning{{FALSE}}
				clang_analyzer_eval(x >= 0); // expected-warning{{TRUE}}
				}
				}

				int fgetc(FILE *);
				void test_fgets(FILE *fp) {
				clang_analyzer_eval(fgetc(fp) < 256); // expected-warning{{TRUE}}
				clang_analyzer_eval(fgetc(fp) >= 0); // expected-warning{{UNKNOWN}}
				}


				typedef unsigned long size_t;
				typedef signed long ssize_t;
				ssize_t read(int, void *, size_t);
				ssize_t write(int, const void *, size_t);
				void test_read_write(int fd, char *buf) {
				glob = 1;
				ssize_t x = write(fd, buf, 10);
				clang_analyzer_eval(glob); // expected-warning{{UNKNOWN}}
				if (x >= 0) {
				clang_analyzer_eval(x <= 10); // expected-warning{{TRUE}}
				ssize_t y = read(fd, &glob, sizeof(glob));
				if (y >= 0) {
				clang_analyzer_eval(y <= sizeof(glob)); // expected-warning{{TRUE}}
				} else {
				// -1 overflows on promotion!
				clang_analyzer_eval(y <= sizeof(glob)); // expected-warning{{FALSE}}
				}
				} else {
				clang_analyzer_eval(x == -1); // expected-warning{{TRUE}}
				}
				}

				size_t fread(void , size_t, size_t, FILE );
				size_t fwrite(const void restrict, size_t, size_t, FILE restrict);
				void test_fread_fwrite(FILE fp, int buf) {
				size_t x = fwrite(buf, sizeof(int), 10, fp);
				clang_analyzer_eval(x <= 10); // expected-warning{{TRUE}}
				size_t y = fread(buf, sizeof(int), 10, fp);
				clang_analyzer_eval(y <= 10); // expected-warning{{TRUE}}
				size_t z = fwrite(buf, sizeof(int), y, fp);
				// FIXME: should be TRUE once symbol-symbol constraint support is improved.
				clang_analyzer_eval(z <= y); // expected-warning{{UNKNOWN}}
				}

				ssize_t getline(char *, size_t , FILE *);
				void test_getline(FILE *fp) {
				char *line = 0;
				size_t n = 0;
				ssize_t len;
				while ((len = getline(&line, &n, fp)) != -1) {
				clang_analyzer_eval(len == 0); // expected-warning{{FALSE}}
				}
				}

				int isascii(int);
				void test_isascii(int x) {
				clang_analyzer_eval(isascii(123)); // expected-warning{{TRUE}}
				clang_analyzer_eval(isascii(-1)); // expected-warning{{FALSE}}
				if (isascii(x)) {
				clang_analyzer_eval(x < 128); // expected-warning{{TRUE}}
				clang_analyzer_eval(x >= 0); // expected-warning{{TRUE}}
				} else {
				if (x > 42)
				clang_analyzer_eval(x >= 128); // expected-warning{{TRUE}}
				else
				clang_analyzer_eval(x < 0); // expected-warning{{TRUE}}
				}
				glob = 1;
				isascii('a');
				clang_analyzer_eval(glob); // expected-warning{{TRUE}}
				}

				int islower(int);
				void test_islower(int x) {
				clang_analyzer_eval(islower('x')); // expected-warning{{TRUE}}
				clang_analyzer_eval(islower('X')); // expected-warning{{FALSE}}
				if (islower(x))
				clang_analyzer_eval(x < 'a'); // expected-warning{{FALSE}}
				}

				int getchar(void);
				void test_getchar() {
				int x = getchar();
				if (x == EOF)
				return;
				clang_analyzer_eval(x < 0); // expected-warning{{FALSE}}
				clang_analyzer_eval(x < 256); // expected-warning{{TRUE}}
				}

				int isalpha(int);
				void test_isalpha() {
				clang_analyzer_eval(isalpha(']')); // expected-warning{{FALSE}}
				clang_analyzer_eval(isalpha('Q')); // expected-warning{{TRUE}}
				clang_analyzer_eval(isalpha(128)); // expected-warning{{UNKNOWN}}
				}

				int isalnum(int);
				void test_alnum() {
				clang_analyzer_eval(isalnum('1')); // expected-warning{{TRUE}}
				clang_analyzer_eval(isalnum(')')); // expected-warning{{FALSE}}
				}

				int isblank(int);
				void test_isblank() {
				clang_analyzer_eval(isblank('\t')); // expected-warning{{TRUE}}
				clang_analyzer_eval(isblank(' ')); // expected-warning{{TRUE}}
				clang_analyzer_eval(isblank('\n')); // expected-warning{{FALSE}}
				}

				int ispunct(int);
				void test_ispunct(int x) {
				clang_analyzer_eval(ispunct(' ')); // expected-warning{{FALSE}}
				clang_analyzer_eval(ispunct(-1)); // expected-warning{{FALSE}}
				clang_analyzer_eval(ispunct('#')); // expected-warning{{TRUE}}
				clang_analyzer_eval(ispunct('_')); // expected-warning{{TRUE}}
				if (ispunct(x))
				clang_analyzer_eval(x < 127); // expected-warning{{TRUE}}
				}

				int isupper(int);
				void test_isupper(int x) {
				if (isupper(x))
				clang_analyzer_eval(x < 'A'); // expected-warning{{FALSE}}
				}

				int isgraph(int);
				int isprint(int);
				void test_isgraph_isprint(int x) {
				char y = x;
				if (isgraph(y))
				clang_analyzer_eval(isprint(x)); // expected-warning{{TRUE}}
				}

				int isdigit(int);
				void test_mixed_branches(int x) {
				if (isdigit(x)) {
				clang_analyzer_eval(isgraph(x)); // expected-warning{{TRUE}}
				clang_analyzer_eval(isblank(x)); // expected-warning{{FALSE}}
				} else if (isascii(x)) {
				// isalnum() bifurcates here.
				clang_analyzer_eval(isalnum(x)); // expected-warning{{TRUE}} // expected-warning{{FALSE}}
				clang_analyzer_eval(isprint(x)); // expected-warning{{TRUE}} // expected-warning{{FALSE}}
				}
				}

				int isspace(int);
				void test_isspace(int x) {
				if (!isascii(x))
				return;
				char y = x;
				if (y == ' ')
				clang_analyzer_eval(isspace(x)); // expected-warning{{TRUE}}
				}

				int isxdigit(int);
				void test_isxdigit(int x) {
				if (isxdigit(x) && isupper(x)) {
				clang_analyzer_eval(x >= 'A'); // expected-warning{{TRUE}}
				clang_analyzer_eval(x <= 'F'); // expected-warning{{TRUE}}
				}
				}

				void test_call_by_pointer() {
				typedef int (*func)(int);
				func f = isascii;
				clang_analyzer_eval(f('A')); // expected-warning{{TRUE}}
				f = ispunct;
				clang_analyzer_eval(f('A')); // expected-warning{{FALSE}}
				}

cfe/trunk/test/Analysis/std-c-library-functions.cpp

				// RUN: %clang_cc1 -analyze -analyzer-checker=unix.StdCLibraryFunctions,debug.ExprInspection -verify %s

				// Test that we don't model functions with broken prototypes.
				// Because they probably work differently as well.
				//
				// This test lives in a separate file because we wanted to test all functions
				// in the .c file, however in C there are no overloads.

				void clang_analyzer_eval(bool);
				bool isalpha(char);

				void test() {
				clang_analyzer_eval(isalpha('A')); // no-crash // expected-warning{{UNKNOWN}}
				}