This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/lib/StaticAnalyzer/Checkers/
-
lib/
-
StaticAnalyzer/
-
Checkers/
1/2
CMakeLists.txt
-
CStringChecker.cpp
-
CStringChecker/
1/2
CStringChecker.h
2/4
CStringChecker.cpp
4/6
CStringLength.h
1/2
CStringLengthModeling.cpp

Differential D84316

[analyzer][NFC] Split CStringChecker to modeling and reporting
Needs ReviewPublic

Authored by steakhal on Jul 22 2020, 6:02 AM.

Download Raw Diff

Details

Reviewers

NoQ
Szelethus
xazax.hun
baloghadamsoftware
vsavchenko
martong
gamesh411

Summary

This patch is an NFC. Mostly moving code segments here and there. Also a few renaming and minor refactorings.

Summary:

Introduces an API for interacting with the modeling part of the CStringChecker.
Using this API other checkers could query and potentially remove/invalidate information about the cstring length of an associated memory region.

This patch significantly reduces the complexity of the CStringChecker.
Introducing a modeling layer the hierarchy will look like this:

CStringModeling (infers cstring length from string literals, invalidates, updates, etc.)
\__ CStringChecker (checker with several filter options)
     \__ NullArg (filter option registered as a distinct checker)
     \__ BadSizeArg (same...)
     \__ OutOfBounds (same...)
     \__ many more...

It is questionable if we want to keep such a hierarchy or not.
Either way, that should be done in a different patch - I suppose.

Diff Detail

Event Timeline

steakhal created this revision.Jul 22 2020, 6:02 AM

Herald added subscribers: ASDenysPetrov, Charusso, dkrupp and 7 others. · View Herald TranscriptJul 22 2020, 6:02 AM

steakhal set the repository for this revision to rG LLVM Github Monorepo.Jul 22 2020, 6:04 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 22 2020, 6:04 AM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Shared accessors look amazing.

If i understood correctly, you're splitting up the part which performs boring bookkeeping for the state trait from the part which models strlen() and other various functions. Such separation also looks amazing because ultimately the first part can be made checker-inspecific (i.e., a reusable half-baked trait that can be instantiated multiple times to track various things regardless of their meaning).

I don't think i understand having unix.cstring.CStringChecker as one more entry in Checkers.td. Do you expect there to be a situation when enabling CStringModeling without CStringChecker actually makes sense? If not, why not keep them agglutinated? That doesn't anyhow contradict the above purpose of having boring bookkeeping separate from actual API modeling.

In D84316#2168462, @NoQ wrote:

Shared accessors look amazing.

[...] you're splitting up the part which performs boring bookkeeping for the state trait from the part which models strlen() and other various functions.

Exactly.

Such separation also looks amazing because ultimately the first part can be made checker-inspecific (i.e., a reusable half-baked trait that can be instantiated multiple times to track various things regardless of their meaning).

Could you elaborate on the latter part? (instantiated multiple times...)

I don't think i understand having unix.cstring.CStringChecker as one more entry in Checkers.td. Do you expect there to be a situation when enabling CStringModeling without CStringChecker actually makes sense?

You seem to be right.
Enabling only the cstring modeling part does not make much sense without enabling at least the CStringChecker to model the cstring manipulation functions - even if the reporting is disabled of such functions.

If not, why not keep them agglutinated?

I wanted to have a separate class for bookkeeping while minimalizing the necessary changes.
What do you think would be the best way to organize this separation?

Few notes:

Checkers are implemented in the anonymous namespace, so only the given file has access to them.
I wanted to separate the bookkeeping logic from the reporting/function modeling logic in different files.
I like the fact that after the change the CStringChecker implements only the evalCall checker callback.

Let me know if I misunderstood something.

Yay! This checker has become a major headache as the analyzer grew. Not on a RetainCount scale, but on a MallocChecker one. Cheap shots, I know :)

The patch looks great! I have some miscellaneous comments, but nothing blocking.

I don't think i understand having unix.cstring.CStringChecker as one more entry in Checkers.td. Do you expect there to be a situation when enabling CStringModeling without CStringChecker actually makes sense?

You seem to be right.
Enabling only the cstring modeling part does not make much sense without enabling at least the CStringChecker to model the cstring manipulation functions - even if the reporting is disabled of such functions.

If not, why not keep them agglutinated?

I wanted to have a separate class for bookkeeping while minimalizing the necessary changes.
What do you think would be the best way to organize this separation?

Few notes:

Checkers are implemented in the anonymous namespace, so only the given file has access to them.

I wanted to separate the bookkeeping logic from the reporting/function modeling logic in different files.

I like the fact that after the change the CStringChecker implements only the evalCall checker callback.

Let me know if I misunderstood something.

Mind that that entry does a lot more then give a flag to the user -- It generates code for a lot of the checker machinery as well. Since CStringModeling still uses the checker callbacks to set up the proper string length, it is a necessity. (strong) Checker dependencies are the exact solution to the the problem where a checker cannot run without another (as I understand it, its not only about making sense, you really cant make CStringChecker work without CStringModeling).

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
429 ↗	(On Diff #279787)	What other aspects of c strings needs to be modeled? Is it only length? If so, how about we rename the checker to `CStringLengthModeling`?
495 ↗	(On Diff #279787)	I dug around a bit, and found this commit as to why this was needed: rGe56167e8f87acf87a9de3d383752e18a738cf056. So this dependency is appropriate.
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
74	This is somewhat orthogonal to the patch, but shouldn't precondition violations be reported at `preCall`?
clang/lib/StaticAnalyzer/Checkers/CStringLength.cpp
175–181 ↗	(On Diff #279787)	We traditionally put these on the bottom of the file -- I don't think this would upset the structure too much :)

In D84316#2169195, @Szelethus wrote:

[...] you really cant make CStringChecker work without CStringModeling

How should I fuse the CStringModeling and the CStringChecker together while splitting them up?

I mean, that would be the best if the CStringChecker would focus on modeling the related cstring functions while letting the CStringModeling do the bookkeeping.
I see some contradiction here.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
429 ↗	(On Diff #279787)	For now I think the cstring length is enough. I'm not sure if we will want to have other properties as well. You are probably right.
495 ↗	(On Diff #279787)	Interesting. I was just doing a search & replace though :)
clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp
74	That is the current behavior. We should consider in the future using `preCall` if we refactor so relentlessly.
clang/lib/StaticAnalyzer/Checkers/CStringLength.cpp
175–181 ↗	(On Diff #279787)	I wanted to place the class definition as close as possible to the registration function. I can move this though.

You could do it in the code, but if the modeling wouldn't be present from CStringModeling, CStringChecker wouldn't work properly. So you should make it a strong dependency, just as you did in this patch. My comment was mainly a response to @NoQ :)

clang/lib/StaticAnalyzer/Checkers/CStringLength.cpp
175–181 ↗	(On Diff #279787)	Yeah, I see what you were going for, but I'd prefer to keep it down there still.

In D84316#2168730, @steakhal wrote:

I wanted to have a separate class for bookkeeping while minimalizing the necessary changes.
What do you think would be the best way to organize this separation?

Few notes:

Checkers are implemented in the anonymous namespace, so only the given file has access to them.

I wanted to separate the bookkeeping logic from the reporting/function modeling logic in different files.

I like the fact that after the change the CStringChecker implements only the evalCall checker callback.

Let me know if I misunderstood something.

Mmm, none of these benefits sound like they outweigh confusing the cost of users with a new checker flag that can't even be used in any sensible way.

If you want separate files, just put the checker into a header and include it from multiple cpp files. A few checkers already do that - RetainCountChecker, MPIChecker, UninitializedObjectChecker. There's nothing fundamental about keeping checkers in an anonymous namespace.

In D84316#2168730, @steakhal wrote:

In D84316#2168462, @NoQ wrote:

Such separation also looks amazing because ultimately the first part can be made checker-inspecific (i.e., a reusable half-baked trait that can be instantiated multiple times to track various things regardless of their meaning).

Could you elaborate on the latter part? (instantiated multiple times...)

Imagine something like re-using the state trait implementation between MallocChecker and StreamChecker because they both model "resources that can be deallocated twice or leaked" - regardless of the specific nature of these resources. These checkers can implement their own API modeling maps, escape rules, warning messages, maybe model additional aspects of their problems, but fundamentally they're solving the same problem: finding leaks and overreleases of resources. This problem should ideally be solved once. This is why i advocate for abstract, generalized, "half-baked" state trait boilerplate implementations that can be re-used across checkers.

Created the CStringChecker subdirectory holding the related files.
CStringChecker split up to 4 files:
- CStringChecker.h: Declaration of the checker class.
- CStringChecker.cpp: Definitions ONLY of the cstirng modeling functions.
- CStringLength.h: The interface declaration of the cstring query/modification API (aka. API header).
- CStringLengthModeling.cpp: Definitions of the bookkeeping part of the CStringChecker class AND the definitions of the API header.
Added the CStringChecker folder as an include directory for the analyzer library, to make it easier to include the API header.

I personally preferred the previous diff, the reporting part of the checker and the string length modeling was separated far more cleanly.

In D84316#2171267, @NoQ wrote:

Mmm, none of these benefits sound like they outweigh confusing the cost of users with a new checker flag that can't even be used in any sensible way.

I think the new checker would be an ideal candidate for a hidden checker, not to mention that D78126+D81761 enforces it anyways with asserts. With that in mind, I don't see what we're giving up here.

If you want separate files, just put the checker into a header and include it from multiple cpp files. A few checkers already do that - RetainCountChecker, MPIChecker, UninitializedObjectChecker. There's nothing fundamental about keeping checkers in an anonymous namespace.

I agree that there is no magical gain from keeping them there. UninitializedObjectChecker, btw, doesn't peek out -- only some of the related infrastructure. With that said, I don't want to make an example out of RetainCountChecker -- the way it is structured is not in line, at least the way I see it, with the modern way to develop large checkers.

edit: Accidentally put a portion of my comment in the quote.

edit2: I realize I criticized RetainCountChecker's structure without actually pointing out the problems in it, I'll follow this up.

In D84316#2171270, @NoQ wrote:

Imagine something like re-using the state trait implementation between MallocChecker and StreamChecker because they both model "resources that can be deallocated twice or leaked" - regardless of the specific nature of these resources. These checkers can implement their own API modeling maps, escape rules, warning messages, maybe model additional aspects of their problems, but fundamentally they're solving the same problem: finding leaks and overreleases of resources. This problem should ideally be solved once. This is why i advocate for abstract, generalized, "half-baked" state trait boilerplate implementations that can be re-used across checkers.

Big +1! I think there is a lot of smart in MallocChecker, and its far less confusing now then it used to be. It'd be worth exploring the merger of those checkers, but we should probably reserve this discussion for another time.

balazske added a subscriber: balazske.Jul 28 2020, 8:59 AM

balazske added inline comments.

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLength.h
43	I do not like that the get and set (CStringLength) functions are not symmetrical. I (and other developers) would think that the get function returns a stored value and the set function sets it. The `getCStringLength` is more a `computeCStringLength` and additionally may manipulate the `State` too. In this form it is usable mostly only for CStringChecker. (A separate function to get the value stored in the length map should exist instead of this `Hypothetical` thing.)

Thanks for checking this @balazske.

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLength.h
43	[...] get function returns a stored value and the set function sets it. Certainly a burden to understand. It would be more appealing, but more useful? The user would have to check and create if necessary regardless. So fusing these two functions is more like a feature. What use case do you think of using only the query function? In other words, how can you guarantee that you will find a length for a symbol? In this form it is usable mostly only for CStringChecker. (A separate function to get the value stored in the length map should exist instead of this Hypothetical thing.) You are right. However, I want to focus on splitting parts without modifying the already existing API reducing the risk of breaking things. You should expect such a change in an upcoming patch.

steakhal added inline comments.Jul 28 2020, 11:31 AM

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLength.h
43	On second thought, It probably worth having a cleaner API to a slight inconvenience. If he feels like, still can wrap them. I will investigate it tomorrow.

steakhal added a child revision: D84979: [analyzer][NFC] Refine CStringLength modeling API.Jul 30 2020, 12:56 PM

steakhal added inline comments.

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLength.h
43	I made a separate patch for cleansing this API. In the D84979 now these API functions will behave as expected.

The title is a little bit confusing because only the C-string size model is going to be separated and be accessible. Other than that as @NoQ pointed out we need lot more of these common-API-separation patches. It is a great starting point for the CStringChecker.

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
146	Other common checker functionality folders and headers do not require extra CMake support long ago. I think when we need such support, we could define it later, so that you could revert this.
clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLength.h
43	I (and other developers) would think that the get function returns a stored value and the set function sets it. Developers should not believe the getters are pure getters. As a checker-writer point of view, you do not care whether the C-string already exist or the checker creates it during symbolic execution, you only want to get the C-string. Think about all the Static Analyzer getters as factory functions, that is the de facto standard now. For example, when you are trying to get a symbolic value with `getSVal()`, for the first occurrence of an expression no `SVal` exist, so it also creates it. With that in mind, @steakhal, could you partially revert the renaming related refactors of D84979, please?

In D84316#2187372, @19n07u5 wrote:

The title is a little bit confusing because only the C-string size model is going to be separated and be accessible.

Could you elaborate on why is the title not precise?
It seems that the modeling logic and the reporting logic will be separated:

modeling will be implemented in CStringLengthModeling.cpp
reporting will be implemented in CStringChecker.cpp (just as like it was before)

I just wanted a short (at most 80 char long) title, if you offer any better I would be pleased.

Other than that as @NoQ pointed out we need lot more of these common-API-separation patches. It is a great starting point for the CStringChecker.

Thanks. I'm thinking about making the checker cleaner - we will see.

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
146	It would be easier to use the `CStringLength.h` header without specifying the complete path to it. IMO `#include "CStringChecker/CStringLength.h"` is kindof verbose compared to simply using `#include "CStringLength.h"`. As of now, I'm sticking to this.
clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLength.h
43	[...] As a checker-writer point of view, you do not care whether the C-string already exist or the checker creates it during symbolic execution, you only want to get the C-string. I would have agreed with you - before I made the D84979 patch. Now I believe if the interface can be implemented purely then it should be done so. Think about all the Static Analyzer getters as factory functions, that is the de facto standard now. We can always change them. For example, when you are trying to get a symbolic value with `getSVal()`, for the first occurrence of an expression no `SVal` exist, so it also creates it. I'm not really familiar with the internals of `getSVal()`, I'm gonna definitely have on that. IMO `getSVal()` is a different beast compared to the functions declared in this header file. With that in mind, @steakhal, could you partially revert the renaming related refactors of D84979, please? I genuinly think that I'm on the right track. If you don't mind, move further discussion about that to the corresponding revision.

Ping

Do I sense correctly that the only information CSrtingLengthModeling.cpp requires from the actual CStringChecker is a checker tag? Because if so, I think we should just separate them even more cleanly -- we could just make a CStringLengthModeling checker implement the checker callbacks in that cpp file and we wouldn't need to move CStringChecker to a header. Not that moving it there is fundamentally bad, but in this instance it seems like we're legalizing bloating checkers instead of separating them. Also, this patch seems to have a significant overlap with D84979 -- is this intentional?

In D84316#2233368, @Szelethus wrote:

Do I sense correctly that the only information CSrtingLengthModeling.cpp requires from the actual CStringChecker is a checker tag?

AFAIK yes.

[...] it seems like we're legalizing bloating checkers instead of separating them.

I agree. I would genuinely have a modeling checker, and a completely different checker using the modeled information, let's call it CStringcChecker.
Unfortunately, that approach was not really well received - probably needs further discussion.

Also, this patch seems to have a significant overlap with D84979 -- is this intentional?

It is. In fact, this patch prepares the code for it.
I wanted to separate large code motion changes (like this one) from API refactoring changes (like D84979 does).
So in some sense, these two patches are hand in hand. Have a look at them ;)

steakhal mentioned this in D86445: [analyzer][RFC] Simplify MetadataSymbol representation and resolve a CStringChecker FIXME.Sep 8 2020, 7:45 AM

NoQ added inline comments.Sep 8 2020, 2:37 PM

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringChecker.cpp
32–34	Why suddenly use arrow syntax here?
clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringChecker.h
227	No NeWlInE aT eNd Of FiLe
clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLengthModeling.cpp
309	Yes it is. It gets invoked during exploded graph dumps and it's an invaluable debugging facility.

I would like to discuss why don't we have a distinct checker managing the bookkeeping stuff of the CString lengths.
I just want a clean understanding and wide consensus about this.

Personally, I would still prefer the original version of this patch (+nits of course).

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringChecker.cpp
32–34	This way I could spare the full name of the class :D I will use the qualified name instead.
clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringChecker.h
227	I'm wondering why did clang-format not add this - I'm really surprised. Thanks.
clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLengthModeling.cpp
309	A strange observation to note here. In the implementation of the dump method, I use the provided `NL` and `Sep` parameters. However, in the `checker_messages` of a State dump, `Sep` seem to be substituted with the empty string. For example taint::printTaint just ignore the `Sep` parameter to possibly workaround this issue.

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Checkers/

7 lines

CStringChecker/

226 lines

53 lines

CStringLengthModeling.cpp

313 lines

	CStringChecker/

CStringChecker.cpp

659 lines

Diff 280398

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt

set(LLVM_LINK_COMPONENTS		set(LLVM_LINK_COMPONENTS
FrontendOpenMP		FrontendOpenMP
Support		Support
)		)

add_clang_library(clangStaticAnalyzerCheckers		add_clang_library(clangStaticAnalyzerCheckers
AnalysisOrderChecker.cpp		AnalysisOrderChecker.cpp
AnalyzerStatsChecker.cpp		AnalyzerStatsChecker.cpp
ArrayBoundChecker.cpp		ArrayBoundChecker.cpp
ArrayBoundCheckerV2.cpp		ArrayBoundCheckerV2.cpp
BasicObjCFoundationChecks.cpp		BasicObjCFoundationChecks.cpp
BlockInCriticalSectionChecker.cpp		BlockInCriticalSectionChecker.cpp
BoolAssignmentChecker.cpp		BoolAssignmentChecker.cpp
BuiltinFunctionChecker.cpp		BuiltinFunctionChecker.cpp
CStringChecker.cpp		CStringChecker/CStringChecker.cpp
		CStringChecker/CStringLengthModeling.cpp
CStringSyntaxChecker.cpp		CStringSyntaxChecker.cpp
CallAndMessageChecker.cpp		CallAndMessageChecker.cpp
CastSizeChecker.cpp		CastSizeChecker.cpp
CastToStructChecker.cpp		CastToStructChecker.cpp
CastValueChecker.cpp		CastValueChecker.cpp
CheckObjCDealloc.cpp		CheckObjCDealloc.cpp
CheckObjCInstMethSignature.cpp		CheckObjCInstMethSignature.cpp
CheckPlacementNew.cpp		CheckPlacementNew.cpp
▲ Show 20 Lines • Show All 110 Lines • ▼ Show 20 Lines	add_clang_library(clangStaticAnalyzerCheckers
clangAnalysis		clangAnalysis
clangBasic		clangBasic
clangLex		clangLex
clangStaticAnalyzerCore		clangStaticAnalyzerCore

DEPENDS		DEPENDS
omp_gen		omp_gen
)		)

		target_include_directories(clangStaticAnalyzerCheckers PRIVATE
		CStringChecker
		)
		IgnotusUnsubmitted Not Done Reply Inline Actions Other common checker functionality folders and headers do not require extra CMake support long ago. I think when we need such support, we could define it later, so that you could revert this. Ignotus: Other common checker functionality folders and headers do not require extra CMake support long…
		steakhalAuthorUnsubmitted Done Reply Inline Actions It would be easier to use the `CStringLength.h` header without specifying the complete path to it. IMO `#include "CStringChecker/CStringLength.h"` is kindof verbose compared to simply using `#include "CStringLength.h"`. As of now, I'm sticking to this. steakhal: It would be easier to use the `CStringLength.h` header without specifying the complete path to…

clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp

This file was moved to clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringChecker.cpp.

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringChecker.h

This file was added.

				//= CStringChecker.h - Checks calls to C string functions ----------- C++ --//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Models C string related functions.
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_CSTRINGCHECKER_CSTRINGCHECKER_H
				#define LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_CSTRINGCHECKER_CSTRINGCHECKER_H

				#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"

				namespace clang {
				namespace ento {
				namespace cstring {

				struct AnyArgExpr {
				// FIXME: Remove constructor in C++17 to turn it into an aggregate.
				AnyArgExpr(const Expr *Expression, unsigned ArgumentIndex)
				: Expression{Expression}, ArgumentIndex{ArgumentIndex} {}
				const Expr *Expression;
				unsigned ArgumentIndex;
				};

				struct SourceArgExpr : AnyArgExpr {
				using AnyArgExpr::AnyArgExpr; // FIXME: Remove using in C++17.
				};

				struct DestinationArgExpr : AnyArgExpr {
				using AnyArgExpr::AnyArgExpr; // FIXME: Same.
				};

				struct SizeArgExpr : AnyArgExpr {
				using AnyArgExpr::AnyArgExpr; // FIXME: Same.
				};

				class CStringChecker
				: public Checker<eval::Call, check::PreStmt<DeclStmt>, check::LiveSymbols,
				check::DeadSymbols, check::RegionChanges> {
				mutable std::unique_ptr<BugType> BT_Null, BT_Bounds, BT_Overlap,
				BT_NotCString;

				mutable const char *CurrentFunctionDescription;

				using ErrorMessage = SmallString<128>;
				enum class AccessKind { write, read };
				enum class ConcatFnKind { none = 0, strcat = 1, strlcat = 2 };

				public:
				/// Models and checks cstring related function pre and post-conditions.
				bool evalCall(const CallEvent &Call, CheckerContext &C) const;

				/// Tracks and maintains the associated cstring lengths of memory regions.
				static void *getTag();
				void checkPreStmt(const DeclStmt *, CheckerContext &) const;
				void checkLiveSymbols(ProgramStateRef, SymbolReaper &) const;
				void checkDeadSymbols(SymbolReaper &, CheckerContext &) const;
				ProgramStateRef
				checkRegionChanges(ProgramStateRef, const InvalidatedSymbols *,
				ArrayRef<const MemRegion >, ArrayRef<const MemRegion >,
				const LocationContext , const CallEvent ) const;
				// TODO: Is it useful?
				void printState(raw_ostream &Out, ProgramStateRef State, const char *NL,
				const char *Sep) const;

				/// The filter is used to filter out the diagnostics which are not enabled by
				/// the user.
				struct {
				DefaultBool CheckCStringNullArg;
				DefaultBool CheckCStringOutOfBounds;
				DefaultBool CheckCStringBufferOverlap;
				DefaultBool CheckCStringNotNullTerm;

				CheckerNameRef CheckNameCStringNullArg;
				CheckerNameRef CheckNameCStringOutOfBounds;
				CheckerNameRef CheckNameCStringBufferOverlap;
				CheckerNameRef CheckNameCStringNotNullTerm;
				} Filter;

				private:
				typedef void (CStringChecker::*FnCheck)(CheckerContext &,
				const CallExpr *) const;
				CallDescriptionMap<FnCheck> Callbacks = {
				{{CDF_MaybeBuiltin, "memcpy", 3}, &CStringChecker::evalMemcpy},
				{{CDF_MaybeBuiltin, "mempcpy", 3}, &CStringChecker::evalMempcpy},
				{{CDF_MaybeBuiltin, "memcmp", 3}, &CStringChecker::evalMemcmp},
				{{CDF_MaybeBuiltin, "memmove", 3}, &CStringChecker::evalMemmove},
				{{CDF_MaybeBuiltin, "memset", 3}, &CStringChecker::evalMemset},
				{{CDF_MaybeBuiltin, "explicit_memset", 3}, &CStringChecker::evalMemset},
				{{CDF_MaybeBuiltin, "strcpy", 2}, &CStringChecker::evalStrcpy},
				{{CDF_MaybeBuiltin, "strncpy", 3}, &CStringChecker::evalStrncpy},
				{{CDF_MaybeBuiltin, "stpcpy", 2}, &CStringChecker::evalStpcpy},
				{{CDF_MaybeBuiltin, "strlcpy", 3}, &CStringChecker::evalStrlcpy},
				{{CDF_MaybeBuiltin, "strcat", 2}, &CStringChecker::evalStrcat},
				{{CDF_MaybeBuiltin, "strncat", 3}, &CStringChecker::evalStrncat},
				{{CDF_MaybeBuiltin, "strlcat", 3}, &CStringChecker::evalStrlcat},
				{{CDF_MaybeBuiltin, "strlen", 1}, &CStringChecker::evalstrLength},
				{{CDF_MaybeBuiltin, "strnlen", 2}, &CStringChecker::evalstrnLength},
				{{CDF_MaybeBuiltin, "strcmp", 2}, &CStringChecker::evalStrcmp},
				{{CDF_MaybeBuiltin, "strncmp", 3}, &CStringChecker::evalStrncmp},
				{{CDF_MaybeBuiltin, "strcasecmp", 2}, &CStringChecker::evalStrcasecmp},
				{{CDF_MaybeBuiltin, "strncasecmp", 3}, &CStringChecker::evalStrncasecmp},
				{{CDF_MaybeBuiltin, "strsep", 2}, &CStringChecker::evalStrsep},
				{{CDF_MaybeBuiltin, "bcopy", 3}, &CStringChecker::evalBcopy},
				{{CDF_MaybeBuiltin, "bcmp", 3}, &CStringChecker::evalMemcmp},
				{{CDF_MaybeBuiltin, "bzero", 2}, &CStringChecker::evalBzero},
				{{CDF_MaybeBuiltin, "explicit_bzero", 2}, &CStringChecker::evalBzero},
				};

				// These require a bit of special handling.
				CallDescription StdCopy{{"std", "copy"}, 3},
				StdCopyBackward{{"std", "copy_backward"}, 3};

				FnCheck identifyCall(const CallEvent &Call, CheckerContext &C) const;
				void evalMemcpy(CheckerContext &C, const CallExpr *CE) const;
				void evalMempcpy(CheckerContext &C, const CallExpr *CE) const;
				void evalMemmove(CheckerContext &C, const CallExpr *CE) const;
				void evalBcopy(CheckerContext &C, const CallExpr *CE) const;
				void evalCopyCommon(CheckerContext &C, const CallExpr *CE,
				ProgramStateRef state, SizeArgExpr Size,
				DestinationArgExpr Dest, SourceArgExpr Source,
				bool Restricted, bool IsMempcpy) const;

				void evalMemcmp(CheckerContext &C, const CallExpr *CE) const;

				void evalstrLength(CheckerContext &C, const CallExpr *CE) const;
				void evalstrnLength(CheckerContext &C, const CallExpr *CE) const;
				void evalstrLengthCommon(CheckerContext &C, const CallExpr *CE,
				bool IsStrnlen = false) const;

				void evalStrcpy(CheckerContext &C, const CallExpr *CE) const;
				void evalStrncpy(CheckerContext &C, const CallExpr *CE) const;
				void evalStpcpy(CheckerContext &C, const CallExpr *CE) const;
				void evalStrlcpy(CheckerContext &C, const CallExpr *CE) const;
				void evalStrcpyCommon(CheckerContext &C, const CallExpr *CE, bool ReturnEnd,
				bool IsBounded, ConcatFnKind appendK,
				bool returnPtr = true) const;

				void evalStrcat(CheckerContext &C, const CallExpr *CE) const;
				void evalStrncat(CheckerContext &C, const CallExpr *CE) const;
				void evalStrlcat(CheckerContext &C, const CallExpr *CE) const;

				void evalStrcmp(CheckerContext &C, const CallExpr *CE) const;
				void evalStrncmp(CheckerContext &C, const CallExpr *CE) const;
				void evalStrcasecmp(CheckerContext &C, const CallExpr *CE) const;
				void evalStrncasecmp(CheckerContext &C, const CallExpr *CE) const;
				void evalStrcmpCommon(CheckerContext &C, const CallExpr *CE,
				bool IsBounded = false, bool IgnoreCase = false) const;

				void evalStrsep(CheckerContext &C, const CallExpr *CE) const;

				void evalStdCopy(CheckerContext &C, const CallExpr *CE) const;
				void evalStdCopyBackward(CheckerContext &C, const CallExpr *CE) const;
				void evalStdCopyCommon(CheckerContext &C, const CallExpr *CE) const;
				void evalMemset(CheckerContext &C, const CallExpr *CE) const;
				void evalBzero(CheckerContext &C, const CallExpr *CE) const;

				// Utility methods

				static ErrorMessage createOutOfBoundErrorMsg(StringRef FunctionDescription,
				AccessKind Access);

				/// Simply wraps the cstring::getCStringLength function to emit warnings.
				SVal getCStringLengthChecked(CheckerContext &Ctx, ProgramStateRef &State,
				const Expr *Ex, SVal Buf,
				bool hypothetical = false) const;

				std::pair<ProgramStateRef, ProgramStateRef> static assumeZero(
				CheckerContext &C, ProgramStateRef state, SVal V, QualType Ty);

				static ProgramStateRef InvalidateBuffer(CheckerContext &C,
				ProgramStateRef state, const Expr *Ex,
				SVal V, bool IsSourceBuffer,
				const Expr *Size);

				static bool SummarizeRegion(raw_ostream &os, ASTContext &Ctx,
				const MemRegion *MR);

				static bool memsetAux(const Expr DstBuffer, SVal CharE, const Expr Size,
				CheckerContext &C, ProgramStateRef &State);

				// Re-usable checks
				ProgramStateRef checkNonNull(CheckerContext &C, ProgramStateRef State,
				AnyArgExpr Arg, SVal l) const;
				ProgramStateRef CheckLocation(CheckerContext &C, ProgramStateRef state,
				AnyArgExpr Buffer, SVal Element,
				AccessKind Access) const;
				ProgramStateRef CheckBufferAccess(CheckerContext &C, ProgramStateRef State,
				AnyArgExpr Buffer, SizeArgExpr Size,
				AccessKind Access) const;
				ProgramStateRef CheckOverlap(CheckerContext &C, ProgramStateRef state,
				SizeArgExpr Size, AnyArgExpr First,
				AnyArgExpr Second) const;
				void emitOverlapBug(CheckerContext &C, ProgramStateRef state,
				const Stmt First, const Stmt Second) const;

				void emitNullArgBug(CheckerContext &C, ProgramStateRef State, const Stmt *S,
				StringRef WarningMsg) const;
				void emitOutOfBoundsBug(CheckerContext &C, ProgramStateRef State,
				const Stmt *S, StringRef WarningMsg) const;
				void emitNotCStringBug(CheckerContext &C, ProgramStateRef State,
				const Stmt *S, StringRef WarningMsg) const;
				void emitAdditionOverflowBug(CheckerContext &C, ProgramStateRef State) const;

				ProgramStateRef checkAdditionOverflow(CheckerContext &C,
				ProgramStateRef state, NonLoc left,
				NonLoc right) const;

				// Return true if the destination buffer of the copy function may be in bound.
				// Expects SVal of Size to be positive and unsigned.
				// Expects SVal of FirstBuf to be a FieldRegion.
				static bool IsFirstBufInBound(CheckerContext &C, ProgramStateRef state,
				const Expr FirstBuf, const Expr Size);
				};

				} // namespace cstring
				} // namespace ento
				} // namespace clang

				#endif
				No newline at end of file
				NoQUnsubmitted Not Done Reply Inline Actions No NeWlInE aT eNd Of FiLe NoQ: No NeWlInE aT eNd Of FiLe
				steakhalAuthorUnsubmitted Done Reply Inline Actions I'm wondering why did clang-format not add this - I'm really surprised. Thanks. steakhal: I'm wondering why did clang-format not add this - I'm really surprised. Thanks.

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringChecker.cpp

This file was moved from clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp.

//= CStringChecker.cpp - Checks calls to C string functions --------- C++ --//		//= CStringChecker.cpp - Checks calls to C string functions --------- C++ --//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This defines CStringChecker, which is an assortment of checks on calls		// This defines CStringChecker, which is an assortment of checks on calls
// to functions in <string.h>.		// to functions in <string.h>.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "InterCheckerAPI.h"		#include "CStringChecker.h"
		#include "CStringLength.h"
#include "clang/Basic/CharInfo.h"		#include "clang/Basic/CharInfo.h"
#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"		#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"		#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
#include "clang/StaticAnalyzer/Core/Checker.h"
#include "clang/StaticAnalyzer/Core/CheckerManager.h"		#include "clang/StaticAnalyzer/Core/CheckerManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallString.h"		#include "llvm/ADT/SmallString.h"
#include "llvm/ADT/StringExtras.h"		#include "llvm/ADT/StringExtras.h"
#include "llvm/Support/raw_ostream.h"		#include "llvm/Support/raw_ostream.h"

using namespace clang;		namespace clang {
using namespace ento;		namespace ento {
		namespace cstring {
namespace {
struct AnyArgExpr {		auto CStringChecker::createOutOfBoundErrorMsg(StringRef FunctionDescription,
// FIXME: Remove constructor in C++17 to turn it into an aggregate.		AccessKind Access)
AnyArgExpr(const Expr *Expression, unsigned ArgumentIndex)		-> ErrorMessage {
		NoQUnsubmitted Not Done Reply Inline Actions Why suddenly use arrow syntax here? NoQ: Why suddenly use arrow syntax here?
		steakhalAuthorUnsubmitted Done Reply Inline Actions This way I could spare the full name of the class :D I will use the qualified name instead. steakhal: This way I could spare the full name of the class :D I will use the qualified name instead.
: Expression{Expression}, ArgumentIndex{ArgumentIndex} {}
const Expr *Expression;
unsigned ArgumentIndex;
};

struct SourceArgExpr : AnyArgExpr {
using AnyArgExpr::AnyArgExpr; // FIXME: Remove using in C++17.
};

struct DestinationArgExpr : AnyArgExpr {
using AnyArgExpr::AnyArgExpr; // FIXME: Same.
};

struct SizeArgExpr : AnyArgExpr {
using AnyArgExpr::AnyArgExpr; // FIXME: Same.
};

using ErrorMessage = SmallString<128>;
enum class AccessKind { write, read };

static ErrorMessage createOutOfBoundErrorMsg(StringRef FunctionDescription,
AccessKind Access) {
ErrorMessage Message;		ErrorMessage Message;
llvm::raw_svector_ostream Os(Message);		llvm::raw_svector_ostream Os(Message);

// Function classification like: Memory copy function		// Function classification like: Memory copy function
Os << toUppercase(FunctionDescription.front())		Os << toUppercase(FunctionDescription.front())
<< &FunctionDescription.data()[1];		<< &FunctionDescription.data()[1];

if (Access == AccessKind::write) {		if (Access == AccessKind::write) {
Os << " overflows the destination buffer";		Os << " overflows the destination buffer";
} else { // read access		} else { // read access
Os << " accesses out-of-bound array element";		Os << " accesses out-of-bound array element";
}		}

return Message;		return Message;
}		}

enum class ConcatFnKind { none = 0, strcat = 1, strlcat = 2 };		//===----------------------------------------------------------------------===//
		SzelethusUnsubmitted Not Done Reply Inline Actions This is somewhat orthogonal to the patch, but shouldn't precondition violations be reported at `preCall`? Szelethus: This is somewhat orthogonal to the patch, but shouldn't precondition violations be reported at…
		steakhalAuthorUnsubmitted Done Reply Inline Actions That is the current behavior. We should consider in the future using `preCall` if we refactor so relentlessly. steakhal: That is the current behavior. We should consider in the future using `preCall` if we refactor…
class CStringChecker : public Checker< eval::Call,		// Individual checks and utility methods.
check::PreStmt<DeclStmt>,		//===----------------------------------------------------------------------===//
check::LiveSymbols,
check::DeadSymbols,
check::RegionChanges
> {
mutable std::unique_ptr<BugType> BT_Null, BT_Bounds, BT_Overlap,
BT_NotCString, BT_AdditionOverflow;

mutable const char *CurrentFunctionDescription;

public:
/// The filter is used to filter out the diagnostics which are not enabled by
/// the user.
struct CStringChecksFilter {
DefaultBool CheckCStringNullArg;
DefaultBool CheckCStringOutOfBounds;
DefaultBool CheckCStringBufferOverlap;
DefaultBool CheckCStringNotNullTerm;

CheckerNameRef CheckNameCStringNullArg;
CheckerNameRef CheckNameCStringOutOfBounds;
CheckerNameRef CheckNameCStringBufferOverlap;
CheckerNameRef CheckNameCStringNotNullTerm;
};

CStringChecksFilter Filter;

static void *getTag() { static int tag; return &tag; }

bool evalCall(const CallEvent &Call, CheckerContext &C) const;
void checkPreStmt(const DeclStmt *DS, CheckerContext &C) const;
void checkLiveSymbols(ProgramStateRef state, SymbolReaper &SR) const;
void checkDeadSymbols(SymbolReaper &SR, CheckerContext &C) const;

ProgramStateRef
checkRegionChanges(ProgramStateRef state,
const InvalidatedSymbols *,
ArrayRef<const MemRegion *> ExplicitRegions,
ArrayRef<const MemRegion *> Regions,
const LocationContext *LCtx,
const CallEvent *Call) const;

typedef void (CStringChecker::*FnCheck)(CheckerContext &,
const CallExpr *) const;
CallDescriptionMap<FnCheck> Callbacks = {
{{CDF_MaybeBuiltin, "memcpy", 3}, &CStringChecker::evalMemcpy},
{{CDF_MaybeBuiltin, "mempcpy", 3}, &CStringChecker::evalMempcpy},
{{CDF_MaybeBuiltin, "memcmp", 3}, &CStringChecker::evalMemcmp},
{{CDF_MaybeBuiltin, "memmove", 3}, &CStringChecker::evalMemmove},
{{CDF_MaybeBuiltin, "memset", 3}, &CStringChecker::evalMemset},
{{CDF_MaybeBuiltin, "explicit_memset", 3}, &CStringChecker::evalMemset},
{{CDF_MaybeBuiltin, "strcpy", 2}, &CStringChecker::evalStrcpy},
{{CDF_MaybeBuiltin, "strncpy", 3}, &CStringChecker::evalStrncpy},
{{CDF_MaybeBuiltin, "stpcpy", 2}, &CStringChecker::evalStpcpy},
{{CDF_MaybeBuiltin, "strlcpy", 3}, &CStringChecker::evalStrlcpy},
{{CDF_MaybeBuiltin, "strcat", 2}, &CStringChecker::evalStrcat},
{{CDF_MaybeBuiltin, "strncat", 3}, &CStringChecker::evalStrncat},
{{CDF_MaybeBuiltin, "strlcat", 3}, &CStringChecker::evalStrlcat},
{{CDF_MaybeBuiltin, "strlen", 1}, &CStringChecker::evalstrLength},
{{CDF_MaybeBuiltin, "strnlen", 2}, &CStringChecker::evalstrnLength},
{{CDF_MaybeBuiltin, "strcmp", 2}, &CStringChecker::evalStrcmp},
{{CDF_MaybeBuiltin, "strncmp", 3}, &CStringChecker::evalStrncmp},
{{CDF_MaybeBuiltin, "strcasecmp", 2}, &CStringChecker::evalStrcasecmp},
{{CDF_MaybeBuiltin, "strncasecmp", 3}, &CStringChecker::evalStrncasecmp},
{{CDF_MaybeBuiltin, "strsep", 2}, &CStringChecker::evalStrsep},
{{CDF_MaybeBuiltin, "bcopy", 3}, &CStringChecker::evalBcopy},
{{CDF_MaybeBuiltin, "bcmp", 3}, &CStringChecker::evalMemcmp},
{{CDF_MaybeBuiltin, "bzero", 2}, &CStringChecker::evalBzero},
{{CDF_MaybeBuiltin, "explicit_bzero", 2}, &CStringChecker::evalBzero},
};

// These require a bit of special handling.
CallDescription StdCopy{{"std", "copy"}, 3},
StdCopyBackward{{"std", "copy_backward"}, 3};

FnCheck identifyCall(const CallEvent &Call, CheckerContext &C) const;
void evalMemcpy(CheckerContext &C, const CallExpr *CE) const;
void evalMempcpy(CheckerContext &C, const CallExpr *CE) const;
void evalMemmove(CheckerContext &C, const CallExpr *CE) const;
void evalBcopy(CheckerContext &C, const CallExpr *CE) const;
void evalCopyCommon(CheckerContext &C, const CallExpr *CE,
ProgramStateRef state, SizeArgExpr Size,
DestinationArgExpr Dest, SourceArgExpr Source,
bool Restricted, bool IsMempcpy) const;

void evalMemcmp(CheckerContext &C, const CallExpr *CE) const;

void evalstrLength(CheckerContext &C, const CallExpr *CE) const;
void evalstrnLength(CheckerContext &C, const CallExpr *CE) const;
void evalstrLengthCommon(CheckerContext &C,
const CallExpr *CE,
bool IsStrnlen = false) const;

void evalStrcpy(CheckerContext &C, const CallExpr *CE) const;
void evalStrncpy(CheckerContext &C, const CallExpr *CE) const;
void evalStpcpy(CheckerContext &C, const CallExpr *CE) const;
void evalStrlcpy(CheckerContext &C, const CallExpr *CE) const;
void evalStrcpyCommon(CheckerContext &C, const CallExpr *CE, bool ReturnEnd,
bool IsBounded, ConcatFnKind appendK,
bool returnPtr = true) const;

void evalStrcat(CheckerContext &C, const CallExpr *CE) const;
void evalStrncat(CheckerContext &C, const CallExpr *CE) const;
void evalStrlcat(CheckerContext &C, const CallExpr *CE) const;

void evalStrcmp(CheckerContext &C, const CallExpr *CE) const;
void evalStrncmp(CheckerContext &C, const CallExpr *CE) const;
void evalStrcasecmp(CheckerContext &C, const CallExpr *CE) const;
void evalStrncasecmp(CheckerContext &C, const CallExpr *CE) const;
void evalStrcmpCommon(CheckerContext &C,
const CallExpr *CE,
bool IsBounded = false,
bool IgnoreCase = false) const;

void evalStrsep(CheckerContext &C, const CallExpr *CE) const;

void evalStdCopy(CheckerContext &C, const CallExpr *CE) const;
void evalStdCopyBackward(CheckerContext &C, const CallExpr *CE) const;
void evalStdCopyCommon(CheckerContext &C, const CallExpr *CE) const;
void evalMemset(CheckerContext &C, const CallExpr *CE) const;
void evalBzero(CheckerContext &C, const CallExpr *CE) const;

// Utility methods		static const StringLiteral *getCStringLiteral(SVal val) {
std::pair<ProgramStateRef , ProgramStateRef >		// Get the memory region pointed to by the val.
static assumeZero(CheckerContext &C,		const MemRegion *bufRegion = val.getAsRegion();
ProgramStateRef state, SVal V, QualType Ty);		if (!bufRegion)
		return nullptr;

static ProgramStateRef setCStringLength(ProgramStateRef state,		// Strip casts off the memory region.
const MemRegion *MR,		bufRegion = bufRegion->StripCasts();
SVal strLength);
static SVal getCStringLengthForRegion(CheckerContext &C,
ProgramStateRef &state,
const Expr *Ex,
const MemRegion *MR,
bool hypothetical);
SVal getCStringLength(CheckerContext &C,
ProgramStateRef &state,
const Expr *Ex,
SVal Buf,
bool hypothetical = false) const;

const StringLiteral *getCStringLiteral(CheckerContext &C,
ProgramStateRef &state,
const Expr *expr,
SVal val) const;

static ProgramStateRef InvalidateBuffer(CheckerContext &C,		// Cast the memory region to a string region.
ProgramStateRef state,		const StringRegion *strRegion = dyn_cast<StringRegion>(bufRegion);
const Expr *Ex, SVal V,		if (!strRegion)
bool IsSourceBuffer,		return nullptr;
const Expr *Size);

static bool SummarizeRegion(raw_ostream &os, ASTContext &Ctx,		// Return the actual string in the string region.
const MemRegion *MR);		return strRegion->getStringLiteral();
		}

static bool memsetAux(const Expr *DstBuffer, SVal CharE,		SVal CStringChecker::getCStringLengthChecked(CheckerContext &Ctx,
const Expr *Size, CheckerContext &C,		ProgramStateRef &State,
ProgramStateRef &State);		const Expr *Ex, SVal Buf,
		bool hypothetical) const {
		SVal CStrLen = cstring::getCStringLength(Ctx, State, Ex, Buf, hypothetical);

// Re-usable checks		// Simply return if everything goes well.
ProgramStateRef checkNonNull(CheckerContext &C, ProgramStateRef State,		// Otherwise we shall investigate why did it fail.
AnyArgExpr Arg, SVal l) const;		if (!CStrLen.isUndef())
ProgramStateRef CheckLocation(CheckerContext &C, ProgramStateRef state,		return CStrLen;
AnyArgExpr Buffer, SVal Element,
AccessKind Access) const;
ProgramStateRef CheckBufferAccess(CheckerContext &C, ProgramStateRef State,
AnyArgExpr Buffer, SizeArgExpr Size,
AccessKind Access) const;
ProgramStateRef CheckOverlap(CheckerContext &C, ProgramStateRef state,
SizeArgExpr Size, AnyArgExpr First,
AnyArgExpr Second) const;
void emitOverlapBug(CheckerContext &C,
ProgramStateRef state,
const Stmt *First,
const Stmt *Second) const;

void emitNullArgBug(CheckerContext &C, ProgramStateRef State, const Stmt *S,		// Handle if the buffer was not referring to a memory region.
StringRef WarningMsg) const;		const MemRegion *MR = Buf.getAsRegion();
void emitOutOfBoundsBug(CheckerContext &C, ProgramStateRef State,		if (!MR) {
const Stmt *S, StringRef WarningMsg) const;		// If we can't get a region, see if it's something we /know/ isn't a
void emitNotCStringBug(CheckerContext &C, ProgramStateRef State,		// C string. In the context of locations, the only time we can issue such
const Stmt *S, StringRef WarningMsg) const;		// a warning is for labels.
void emitAdditionOverflowBug(CheckerContext &C, ProgramStateRef State) const;		if (Optional<loc::GotoLabel> Label = Buf.getAs<loc::GotoLabel>()) {
		if (Filter.CheckCStringNotNullTerm) {
		SmallString<120> buf;
		llvm::raw_svector_ostream os(buf);
		assert(CurrentFunctionDescription);
		os << "Argument to " << CurrentFunctionDescription
		<< " is the address of the label '" << Label->getLabel()->getName()
		<< "', which is not a null-terminated string";

ProgramStateRef checkAdditionOverflow(CheckerContext &C,		emitNotCStringBug(Ctx, State, Ex, os.str());
ProgramStateRef state,		}
NonLoc left,		return UndefinedVal();
NonLoc right) const;		}
		}

// Return true if the destination buffer of the copy function may be in bound.		// Other regions (mostly non-data) can't have a reliable C string length.
// Expects SVal of Size to be positive and unsigned.		// In this case, an error is emitted and UndefinedVal is returned.
// Expects SVal of FirstBuf to be a FieldRegion.		// The caller should always be prepared to handle this case.
static bool IsFirstBufInBound(CheckerContext &C,		if (Filter.CheckCStringNotNullTerm) {
ProgramStateRef state,		SmallString<120> buf;
const Expr *FirstBuf,		llvm::raw_svector_ostream os(buf);
const Expr *Size);
};

} //end anonymous namespace		assert(CurrentFunctionDescription);
		os << "Argument to " << CurrentFunctionDescription << " is ";

REGISTER_MAP_WITH_PROGRAMSTATE(CStringLength, const MemRegion *, SVal)		if (SummarizeRegion(os, Ctx.getASTContext(), MR))
		os << ", which is not a null-terminated string";
		else
		os << "not a null-terminated string";

//===----------------------------------------------------------------------===//		emitNotCStringBug(Ctx, State, Ex, os.str());
// Individual checks and utility methods.		}
//===----------------------------------------------------------------------===//		return UndefinedVal();
		}

std::pair<ProgramStateRef , ProgramStateRef >		std::pair<ProgramStateRef , ProgramStateRef >
CStringChecker::assumeZero(CheckerContext &C, ProgramStateRef state, SVal V,		CStringChecker::assumeZero(CheckerContext &C, ProgramStateRef state, SVal V,
QualType Ty) {		QualType Ty) {
Optional<DefinedSVal> val = V.getAs<DefinedSVal>();		Optional<DefinedSVal> val = V.getAs<DefinedSVal>();
if (!val)		if (!val)
return std::pair<ProgramStateRef , ProgramStateRef >(state, state);		return std::pair<ProgramStateRef , ProgramStateRef >(state, state);

▲ Show 20 Lines • Show All 400 Lines • ▼ Show 20 Lines	if (Optional<NonLoc> maxMinusRightNL = maxMinusRight.getAs<NonLoc>()) {
// From now on, assume an overflow didn't occur.		// From now on, assume an overflow didn't occur.
assert(stateOkay);		assert(stateOkay);
state = stateOkay;		state = stateOkay;
}		}

return state;		return state;
}		}

ProgramStateRef CStringChecker::setCStringLength(ProgramStateRef state,
const MemRegion *MR,
SVal strLength) {
assert(!strLength.isUndef() && "Attempt to set an undefined string length");

MR = MR->StripCasts();

switch (MR->getKind()) {
case MemRegion::StringRegionKind:
// FIXME: This can happen if we strcpy() into a string region. This is
// undefined [C99 6.4.5p6], but we should still warn about it.
return state;

case MemRegion::SymbolicRegionKind:
case MemRegion::AllocaRegionKind:
case MemRegion::NonParamVarRegionKind:
case MemRegion::ParamVarRegionKind:
case MemRegion::FieldRegionKind:
case MemRegion::ObjCIvarRegionKind:
// These are the types we can currently track string lengths for.
break;

case MemRegion::ElementRegionKind:
// FIXME: Handle element regions by upper-bounding the parent region's
// string length.
return state;

default:
// Other regions (mostly non-data) can't have a reliable C string length.
// For now, just ignore the change.
// FIXME: These are rare but not impossible. We should output some kind of
// warning for things like strcpy((char[]){'a', 0}, "b");
return state;
}

if (strLength.isUnknown())
return state->remove<CStringLength>(MR);

return state->set<CStringLength>(MR, strLength);
}

SVal CStringChecker::getCStringLengthForRegion(CheckerContext &C,
ProgramStateRef &state,
const Expr *Ex,
const MemRegion *MR,
bool hypothetical) {
if (!hypothetical) {
// If there's a recorded length, go ahead and return it.
const SVal *Recorded = state->get<CStringLength>(MR);
if (Recorded)
return *Recorded;
}

// Otherwise, get a new symbol and update the state.
SValBuilder &svalBuilder = C.getSValBuilder();
QualType sizeTy = svalBuilder.getContext().getSizeType();
SVal strLength = svalBuilder.getMetadataSymbolVal(CStringChecker::getTag(),
MR, Ex, sizeTy,
C.getLocationContext(),
C.blockCount());

if (!hypothetical) {
if (Optional<NonLoc> strLn = strLength.getAs<NonLoc>()) {
// In case of unbounded calls strlen etc bound the range to SIZE_MAX/4
BasicValueFactory &BVF = svalBuilder.getBasicValueFactory();
const llvm::APSInt &maxValInt = BVF.getMaxValue(sizeTy);
llvm::APSInt fourInt = APSIntType(maxValInt).getValue(4);
const llvm::APSInt *maxLengthInt = BVF.evalAPSInt(BO_Div, maxValInt,
fourInt);
NonLoc maxLength = svalBuilder.makeIntVal(*maxLengthInt);
SVal evalLength = svalBuilder.evalBinOpNN(state, BO_LE, *strLn,
maxLength, sizeTy);
state = state->assume(evalLength.castAs<DefinedOrUnknownSVal>(), true);
}
state = state->set<CStringLength>(MR, strLength);
}

return strLength;
}

SVal CStringChecker::getCStringLength(CheckerContext &C, ProgramStateRef &state,
const Expr *Ex, SVal Buf,
bool hypothetical) const {
const MemRegion *MR = Buf.getAsRegion();
if (!MR) {
// If we can't get a region, see if it's something we /know/ isn't a
// C string. In the context of locations, the only time we can issue such
// a warning is for labels.
if (Optional<loc::GotoLabel> Label = Buf.getAs<loc::GotoLabel>()) {
if (Filter.CheckCStringNotNullTerm) {
SmallString<120> buf;
llvm::raw_svector_ostream os(buf);
assert(CurrentFunctionDescription);
os << "Argument to " << CurrentFunctionDescription
<< " is the address of the label '" << Label->getLabel()->getName()
<< "', which is not a null-terminated string";

emitNotCStringBug(C, state, Ex, os.str());
}
return UndefinedVal();
}

// If it's not a region and not a label, give up.
return UnknownVal();
}

// If we have a region, strip casts from it and see if we can figure out
// its length. For anything we can't figure out, just return UnknownVal.
MR = MR->StripCasts();

switch (MR->getKind()) {
case MemRegion::StringRegionKind: {
// Modifying the contents of string regions is undefined [C99 6.4.5p6],
// so we can assume that the byte length is the correct C string length.
SValBuilder &svalBuilder = C.getSValBuilder();
QualType sizeTy = svalBuilder.getContext().getSizeType();
const StringLiteral *strLit = cast<StringRegion>(MR)->getStringLiteral();
return svalBuilder.makeIntVal(strLit->getByteLength(), sizeTy);
}
case MemRegion::SymbolicRegionKind:
case MemRegion::AllocaRegionKind:
case MemRegion::NonParamVarRegionKind:
case MemRegion::ParamVarRegionKind:
case MemRegion::FieldRegionKind:
case MemRegion::ObjCIvarRegionKind:
return getCStringLengthForRegion(C, state, Ex, MR, hypothetical);
case MemRegion::CompoundLiteralRegionKind:
// FIXME: Can we track this? Is it necessary?
return UnknownVal();
case MemRegion::ElementRegionKind:
// FIXME: How can we handle this? It's not good enough to subtract the
// offset from the base string length; consider "123\x00567" and &a[5].
return UnknownVal();
default:
// Other regions (mostly non-data) can't have a reliable C string length.
// In this case, an error is emitted and UndefinedVal is returned.
// The caller should always be prepared to handle this case.
if (Filter.CheckCStringNotNullTerm) {
SmallString<120> buf;
llvm::raw_svector_ostream os(buf);

assert(CurrentFunctionDescription);
os << "Argument to " << CurrentFunctionDescription << " is ";

if (SummarizeRegion(os, C.getASTContext(), MR))
os << ", which is not a null-terminated string";
else
os << "not a null-terminated string";

emitNotCStringBug(C, state, Ex, os.str());
}
return UndefinedVal();
}
}

const StringLiteral *CStringChecker::getCStringLiteral(CheckerContext &C,
ProgramStateRef &state, const Expr *expr, SVal val) const {

// Get the memory region pointed to by the val.
const MemRegion *bufRegion = val.getAsRegion();
if (!bufRegion)
return nullptr;

// Strip casts off the memory region.
bufRegion = bufRegion->StripCasts();

// Cast the memory region to a string region.
const StringRegion *strRegion= dyn_cast<StringRegion>(bufRegion);
if (!strRegion)
return nullptr;

// Return the actual string in the string region.
return strRegion->getStringLiteral();
}

bool CStringChecker::IsFirstBufInBound(CheckerContext &C,		bool CStringChecker::IsFirstBufInBound(CheckerContext &C,
ProgramStateRef state,		ProgramStateRef state,
const Expr *FirstBuf,		const Expr *FirstBuf,
const Expr *Size) {		const Expr *Size) {
// If we do not know that the buffer is long enough we return 'true'.		// If we do not know that the buffer is long enough we return 'true'.
// Otherwise the parent region of this field region would also get		// Otherwise the parent region of this field region would also get
// invalidated, which would lead to warnings based on an unknown state.		// invalidated, which would lead to warnings based on an unknown state.

▲ Show 20 Lines • Show All 209 Lines • ▼ Show 20 Lines	if (StateWholeReg && !StateNotWholeReg && StateNullChar &&
// third argument, just invalidate buffer.		// third argument, just invalidate buffer.
State = InvalidateBuffer(C, State, DstBuffer, MemVal,		State = InvalidateBuffer(C, State, DstBuffer, MemVal,
/IsSourceBuffer/ false, Size);		/IsSourceBuffer/ false, Size);
}		}

if (StateNullChar && !StateNonNullChar) {		if (StateNullChar && !StateNonNullChar) {
// If the value of the second argument of 'memset()' is zero, set the		// If the value of the second argument of 'memset()' is zero, set the
// string length of destination buffer to 0 directly.		// string length of destination buffer to 0 directly.
State = setCStringLength(State, MR,		State = cstring::setCStringLength(
svalBuilder.makeZeroVal(Ctx.getSizeType()));		State, MR, svalBuilder.makeZeroVal(Ctx.getSizeType()));
} else if (!StateNullChar && StateNonNullChar) {		} else if (!StateNullChar && StateNonNullChar) {
SVal NewStrLen = svalBuilder.getMetadataSymbolVal(		SVal NewStrLen = svalBuilder.getMetadataSymbolVal(
CStringChecker::getTag(), MR, DstBuffer, Ctx.getSizeType(),		CStringChecker::getTag(), MR, DstBuffer, Ctx.getSizeType(),
C.getLocationContext(), C.blockCount());		C.getLocationContext(), C.blockCount());

// If the value of second argument is not zero, then the string length		// If the value of second argument is not zero, then the string length
// is at least the size argument.		// is at least the size argument.
SVal NewStrLenGESize = svalBuilder.evalBinOp(		SVal NewStrLenGESize = svalBuilder.evalBinOp(
State, BO_GE, NewStrLen, SizeVal, svalBuilder.getConditionType());		State, BO_GE, NewStrLen, SizeVal, svalBuilder.getConditionType());

State = setCStringLength(		State = cstring::setCStringLength(
State->assume(NewStrLenGESize.castAs<DefinedOrUnknownSVal>(), true),		State->assume(NewStrLenGESize.castAs<DefinedOrUnknownSVal>(), true),
MR, NewStrLen);		MR, NewStrLen);
}		}
} else {		} else {
// If the offset is not zero and char value is not concrete, we can do		// If the offset is not zero and char value is not concrete, we can do
// nothing but invalidate the buffer.		// nothing but invalidate the buffer.
State = InvalidateBuffer(C, State, DstBuffer, MemVal,		State = InvalidateBuffer(C, State, DstBuffer, MemVal,
/IsSourceBuffer/ false, Size);		/IsSourceBuffer/ false, Size);
▲ Show 20 Lines • Show All 271 Lines • ▼ Show 20 Lines	void CStringChecker::evalstrLengthCommon(CheckerContext &C, const CallExpr *CE,
// Check that the string argument is non-null.		// Check that the string argument is non-null.
AnyArgExpr Arg = {CE->getArg(0), 0};		AnyArgExpr Arg = {CE->getArg(0), 0};
SVal ArgVal = state->getSVal(Arg.Expression, LCtx);		SVal ArgVal = state->getSVal(Arg.Expression, LCtx);
state = checkNonNull(C, state, Arg, ArgVal);		state = checkNonNull(C, state, Arg, ArgVal);

if (!state)		if (!state)
return;		return;

SVal strLength = getCStringLength(C, state, Arg.Expression, ArgVal);		SVal strLength = getCStringLengthChecked(C, state, Arg.Expression, ArgVal);

// If the argument isn't a valid C string, there's no valid state to		// If the argument isn't a valid C string, there's no valid state to
// transition to.		// transition to.
if (strLength.isUndef())		if (strLength.isUndef())
return;		return;

DefinedOrUnknownSVal result = UnknownVal();		DefinedOrUnknownSVal result = UnknownVal();

▲ Show 20 Lines • Show All 150 Lines • ▼ Show 20 Lines	void CStringChecker::evalStrcpyCommon(CheckerContext &C, const CallExpr *CE,
// Check that the source is non-null.		// Check that the source is non-null.
SourceArgExpr srcExpr = {CE->getArg(1), 1};		SourceArgExpr srcExpr = {CE->getArg(1), 1};
SVal srcVal = state->getSVal(srcExpr.Expression, LCtx);		SVal srcVal = state->getSVal(srcExpr.Expression, LCtx);
state = checkNonNull(C, state, srcExpr, srcVal);		state = checkNonNull(C, state, srcExpr, srcVal);
if (!state)		if (!state)
return;		return;

// Get the string length of the source.		// Get the string length of the source.
SVal strLength = getCStringLength(C, state, srcExpr.Expression, srcVal);		SVal strLength =
		getCStringLengthChecked(C, state, srcExpr.Expression, srcVal);
Optional<NonLoc> strLengthNL = strLength.getAs<NonLoc>();		Optional<NonLoc> strLengthNL = strLength.getAs<NonLoc>();

// Get the string length of the destination buffer.		// Get the string length of the destination buffer.
SVal dstStrLength = getCStringLength(C, state, Dst.Expression, DstVal);		SVal dstStrLength = getCStringLengthChecked(C, state, Dst.Expression, DstVal);
Optional<NonLoc> dstStrLengthNL = dstStrLength.getAs<NonLoc>();		Optional<NonLoc> dstStrLengthNL = dstStrLength.getAs<NonLoc>();

// If the source isn't a valid C string, give up.		// If the source isn't a valid C string, give up.
if (strLength.isUndef())		if (strLength.isUndef())
return;		return;

SValBuilder &svalBuilder = C.getSValBuilder();		SValBuilder &svalBuilder = C.getSValBuilder();
QualType cmpTy = svalBuilder.getConditionType();		QualType cmpTy = svalBuilder.getConditionType();
▲ Show 20 Lines • Show All 207 Lines • ▼ Show 20 Lines	if (amountCopiedNL && dstStrLengthNL) {
*dstStrLengthNL, sizeTy);		*dstStrLengthNL, sizeTy);
}		}

// If we couldn't get a single value for the final string length,		// If we couldn't get a single value for the final string length,
// we can at least bound it by the individual lengths.		// we can at least bound it by the individual lengths.
if (finalStrLength.isUnknown()) {		if (finalStrLength.isUnknown()) {
// Try to get a "hypothetical" string length symbol, which we can later		// Try to get a "hypothetical" string length symbol, which we can later
// set as a real value if that turns out to be the case.		// set as a real value if that turns out to be the case.
finalStrLength = getCStringLength(C, state, CE, DstVal, true);		finalStrLength = getCStringLengthChecked(C, state, CE, DstVal, true);
assert(!finalStrLength.isUndef());		assert(!finalStrLength.isUndef());

if (Optional<NonLoc> finalStrLengthNL = finalStrLength.getAs<NonLoc>()) {		if (Optional<NonLoc> finalStrLengthNL = finalStrLength.getAs<NonLoc>()) {
if (amountCopiedNL && appendK == ConcatFnKind::none) {		if (amountCopiedNL && appendK == ConcatFnKind::none) {
// we overwrite dst string with the src		// we overwrite dst string with the src
// finalStrLength >= srcStrLength		// finalStrLength >= srcStrLength
SVal sourceInResult = svalBuilder.evalBinOpNN(		SVal sourceInResult = svalBuilder.evalBinOpNN(
state, BO_GE, finalStrLengthNL, amountCopiedNL, cmpTy);		state, BO_GE, finalStrLengthNL, amountCopiedNL, cmpTy);
▲ Show 20 Lines • Show All 93 Lines • ▼ Show 20 Lines	if (Optional<loc::MemRegionVal> dstRegVal =
if (IsBounded && (appendK == ConcatFnKind::none)) {		if (IsBounded && (appendK == ConcatFnKind::none)) {
// strncpy is annoying in that it doesn't guarantee to null-terminate		// strncpy is annoying in that it doesn't guarantee to null-terminate
// the result string. If the original string didn't fit entirely inside		// the result string. If the original string didn't fit entirely inside
// the bound (including the null-terminator), we don't know how long the		// the bound (including the null-terminator), we don't know how long the
// result is.		// result is.
if (amountCopied != strLength)		if (amountCopied != strLength)
finalStrLength = UnknownVal();		finalStrLength = UnknownVal();
}		}
state = setCStringLength(state, dstRegVal->getRegion(), finalStrLength);		state = cstring::setCStringLength(state, dstRegVal->getRegion(),
		finalStrLength);
}		}

assert(state);		assert(state);

if (returnPtr) {		if (returnPtr) {
// If this is a stpcpy-style copy, but we were unable to check for a buffer		// If this is a stpcpy-style copy, but we were unable to check for a buffer
// overflow, we still need a result. Conjure a return value.		// overflow, we still need a result. Conjure a return value.
if (ReturnEnd && Result.isUnknown()) {		if (ReturnEnd && Result.isUnknown()) {
▲ Show 20 Lines • Show All 43 Lines • ▼ Show 20 Lines	void CStringChecker::evalStrcmpCommon(CheckerContext &C, const CallExpr *CE,
// Check that the second string is non-null.		// Check that the second string is non-null.
AnyArgExpr Right = {CE->getArg(1), 1};		AnyArgExpr Right = {CE->getArg(1), 1};
SVal RightVal = state->getSVal(Right.Expression, LCtx);		SVal RightVal = state->getSVal(Right.Expression, LCtx);
state = checkNonNull(C, state, Right, RightVal);		state = checkNonNull(C, state, Right, RightVal);
if (!state)		if (!state)
return;		return;

// Get the string length of the first string or give up.		// Get the string length of the first string or give up.
SVal LeftLength = getCStringLength(C, state, Left.Expression, LeftVal);		SVal LeftLength = getCStringLengthChecked(C, state, Left.Expression, LeftVal);
if (LeftLength.isUndef())		if (LeftLength.isUndef())
return;		return;

// Get the string length of the second string or give up.		// Get the string length of the second string or give up.
SVal RightLength = getCStringLength(C, state, Right.Expression, RightVal);		SVal RightLength =
		getCStringLengthChecked(C, state, Right.Expression, RightVal);
if (RightLength.isUndef())		if (RightLength.isUndef())
return;		return;

// If we know the two buffers are the same, we know the result is 0.		// If we know the two buffers are the same, we know the result is 0.
// First, get the two buffers' addresses. Another checker will have already		// First, get the two buffers' addresses. Another checker will have already
// made sure they're not undefined.		// made sure they're not undefined.
DefinedOrUnknownSVal LV = LeftVal.castAs<DefinedOrUnknownSVal>();		DefinedOrUnknownSVal LV = LeftVal.castAs<DefinedOrUnknownSVal>();
DefinedOrUnknownSVal RV = RightVal.castAs<DefinedOrUnknownSVal>();		DefinedOrUnknownSVal RV = RightVal.castAs<DefinedOrUnknownSVal>();
Show All 18 Lines	void CStringChecker::evalStrcmpCommon(CheckerContext &C, const CallExpr *CE,

assert(StNotSameBuf);		assert(StNotSameBuf);
state = StNotSameBuf;		state = StNotSameBuf;

// At this point we can go about comparing the two buffers.		// At this point we can go about comparing the two buffers.
// For now, we only do this if they're both known string literals.		// For now, we only do this if they're both known string literals.

// Attempt to extract string literals from both expressions.		// Attempt to extract string literals from both expressions.
const StringLiteral *LeftStrLiteral =		const StringLiteral *LeftStrLiteral = getCStringLiteral(LeftVal);
getCStringLiteral(C, state, Left.Expression, LeftVal);		const StringLiteral *RightStrLiteral = getCStringLiteral(RightVal);
const StringLiteral *RightStrLiteral =
getCStringLiteral(C, state, Right.Expression, RightVal);
bool canComputeResult = false;		bool canComputeResult = false;
SVal resultVal = svalBuilder.conjureSymbolVal(nullptr, CE, LCtx,		SVal resultVal = svalBuilder.conjureSymbolVal(nullptr, CE, LCtx,
C.blockCount());		C.blockCount());

if (LeftStrLiteral && RightStrLiteral) {		if (LeftStrLiteral && RightStrLiteral) {
StringRef LeftStrRef = LeftStrLiteral->getString();		StringRef LeftStrRef = LeftStrLiteral->getString();
StringRef RightStrRef = RightStrLiteral->getString();		StringRef RightStrRef = RightStrLiteral->getString();

▲ Show 20 Lines • Show All 295 Lines • ▼ Show 20 Lines	bool CStringChecker::evalCall(const CallEvent &Call, CheckerContext &C) const {
// handler.		// handler.
// Note, the custom CString evaluation calls assume that basic safety		// Note, the custom CString evaluation calls assume that basic safety
// properties are held. However, if the user chooses to turn off some of these		// properties are held. However, if the user chooses to turn off some of these
// checks, we ignore the issues and leave the call evaluation to a generic		// checks, we ignore the issues and leave the call evaluation to a generic
// handler.		// handler.
return C.isDifferent();		return C.isDifferent();
}		}

void CStringChecker::checkPreStmt(const DeclStmt *DS, CheckerContext &C) const {		} // namespace cstring
// Record string length for char a[] = "abc";		} // namespace ento
ProgramStateRef state = C.getState();		} // namespace clang

for (const auto *I : DS->decls()) {
const VarDecl *D = dyn_cast<VarDecl>(I);
if (!D)
continue;

// FIXME: Handle array fields of structs.
if (!D->getType()->isArrayType())
continue;

const Expr *Init = D->getInit();
if (!Init)
continue;
if (!isa<StringLiteral>(Init))
continue;

Loc VarLoc = state->getLValue(D, C.getLocationContext());
const MemRegion *MR = VarLoc.getAsRegion();
if (!MR)
continue;

SVal StrVal = C.getSVal(Init);
assert(StrVal.isValid() && "Initializer string is unknown or undefined");
DefinedOrUnknownSVal strLength =
getCStringLength(C, state, Init, StrVal).castAs<DefinedOrUnknownSVal>();

state = state->set<CStringLength>(MR, strLength);
}

C.addTransition(state);
}

ProgramStateRef
CStringChecker::checkRegionChanges(ProgramStateRef state,
const InvalidatedSymbols *,
ArrayRef<const MemRegion *> ExplicitRegions,
ArrayRef<const MemRegion *> Regions,
const LocationContext *LCtx,
const CallEvent *Call) const {
CStringLengthTy Entries = state->get<CStringLength>();
if (Entries.isEmpty())
return state;

llvm::SmallPtrSet<const MemRegion *, 8> Invalidated;
llvm::SmallPtrSet<const MemRegion *, 32> SuperRegions;

// First build sets for the changed regions and their super-regions.
for (ArrayRef<const MemRegion *>::iterator
I = Regions.begin(), E = Regions.end(); I != E; ++I) {
const MemRegion MR = I;
Invalidated.insert(MR);

SuperRegions.insert(MR);
while (const SubRegion *SR = dyn_cast<SubRegion>(MR)) {
MR = SR->getSuperRegion();
SuperRegions.insert(MR);
}
}

CStringLengthTy::Factory &F = state->get_context<CStringLength>();

// Then loop over the entries in the current state.
for (CStringLengthTy::iterator I = Entries.begin(),
E = Entries.end(); I != E; ++I) {
const MemRegion *MR = I.getKey();

// Is this entry for a super-region of a changed region?
if (SuperRegions.count(MR)) {
Entries = F.remove(Entries, MR);
continue;
}

// Is this entry for a sub-region of a changed region?
const MemRegion *Super = MR;
while (const SubRegion *SR = dyn_cast<SubRegion>(Super)) {
Super = SR->getSuperRegion();
if (Invalidated.count(Super)) {
Entries = F.remove(Entries, MR);
break;
}
}
}

return state->set<CStringLength>(Entries);
}

void CStringChecker::checkLiveSymbols(ProgramStateRef state,		void clang::ento::registerCStringModeling(CheckerManager &Mgr) {
SymbolReaper &SR) const {		Mgr.registerChecker<clang::ento::cstring::CStringChecker>();
// Mark all symbols in our string length map as valid.
CStringLengthTy Entries = state->get<CStringLength>();

for (CStringLengthTy::iterator I = Entries.begin(), E = Entries.end();
I != E; ++I) {
SVal Len = I.getData();

for (SymExpr::symbol_iterator si = Len.symbol_begin(),
se = Len.symbol_end(); si != se; ++si)
SR.markInUse(*si);
}
}		}

void CStringChecker::checkDeadSymbols(SymbolReaper &SR,		bool clang::ento::shouldRegisterCStringModeling(const CheckerManager &) {
CheckerContext &C) const {
ProgramStateRef state = C.getState();
CStringLengthTy Entries = state->get<CStringLength>();
if (Entries.isEmpty())
return;

CStringLengthTy::Factory &F = state->get_context<CStringLength>();
for (CStringLengthTy::iterator I = Entries.begin(), E = Entries.end();
I != E; ++I) {
SVal Len = I.getData();
if (SymbolRef Sym = Len.getAsSymbol()) {
if (SR.isDead(Sym))
Entries = F.remove(Entries, I.getKey());
}
}

state = state->set<CStringLength>(Entries);
C.addTransition(state);
}

void ento::registerCStringModeling(CheckerManager &Mgr) {
Mgr.registerChecker<CStringChecker>();
}

bool ento::shouldRegisterCStringModeling(const CheckerManager &mgr) {
return true;		return true;
}		}

#define REGISTER_CHECKER(name) \		#define REGISTER_CHECKER(name) \
void ento::register##name(CheckerManager &mgr) { \		void clang::ento::register##name(clang::ento::CheckerManager &mgr) { \
CStringChecker *checker = mgr.getChecker<CStringChecker>(); \		auto *checker = mgr.getChecker<clang::ento::cstring::CStringChecker>(); \
checker->Filter.Check##name = true; \		checker->Filter.Check##name = true; \
checker->Filter.CheckName##name = mgr.getCurrentCheckerName(); \		checker->Filter.CheckName##name = mgr.getCurrentCheckerName(); \
} \		} \
\		\
bool ento::shouldRegister##name(const CheckerManager &mgr) { return true; }		bool clang::ento::shouldRegister##name( \
		const clang::ento::CheckerManager &mgr) { \
		return true; \
		}

REGISTER_CHECKER(CStringNullArg)		REGISTER_CHECKER(CStringNullArg)
REGISTER_CHECKER(CStringOutOfBounds)		REGISTER_CHECKER(CStringOutOfBounds)
REGISTER_CHECKER(CStringBufferOverlap)		REGISTER_CHECKER(CStringBufferOverlap)
REGISTER_CHECKER(CStringNotNullTerm)		REGISTER_CHECKER(CStringNotNullTerm)

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLength.h

This file was added.

				//=== CStringLength.h Query and store the length of a cstring. ---- C++ ---=//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Defines an interface for interacting and manipulating the associated cstring
				// length of a given memory region.
				// You can assign a cstring length to any memory region.
				// The represented value is what strlen would return on the given memory region.
				// Eg: 3 for both "ABC" and "abc\00def".
				//
				//===----------------------------------------------------------------------===//

				#ifndef LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_CSTRINGLENGTH_H
				#define LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_CSTRINGLENGTH_H

				#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"

				namespace clang {
				namespace ento {
				class CheckerContext;

				namespace cstring {

				/// Assigns a cstring length to a memory region.
				LLVM_NODISCARD ProgramStateRef setCStringLength(ProgramStateRef State,
				const MemRegion *MR,
				SVal StrLength);

				/// Removes the assigned cstring length from the memory region.
				/// It is useful for invalidation.
				LLVM_NODISCARD ProgramStateRef removeCStringLength(ProgramStateRef State,
				const MemRegion *MR);

				// FIXME: Eventually rework the interface of this function.
				// Especially the magic 'Hypothetical' parameter.
				LLVM_NODISCARD SVal getCStringLength(CheckerContext &Ctx,
				ProgramStateRef &State, const Expr *Ex,
				SVal Buf, bool Hypothetical = false);
				balazskeUnsubmitted Not Done Reply Inline Actions I do not like that the get and set (CStringLength) functions are not symmetrical. I (and other developers) would think that the get function returns a stored value and the set function sets it. The `getCStringLength` is more a `computeCStringLength` and additionally may manipulate the `State` too. In this form it is usable mostly only for CStringChecker. (A separate function to get the value stored in the length map should exist instead of this `Hypothetical` thing.) balazske: I do not like that the //get// and //set// (CStringLength) functions are not symmetrical. I…
				steakhalAuthorUnsubmitted Done Reply Inline Actions [...] get function returns a stored value and the set function sets it. Certainly a burden to understand. It would be more appealing, but more useful? The user would have to check and create if necessary regardless. So fusing these two functions is more like a feature. What use case do you think of using only the query function? In other words, how can you guarantee that you will find a length for a symbol? In this form it is usable mostly only for CStringChecker. (A separate function to get the value stored in the length map should exist instead of this Hypothetical thing.) You are right. However, I want to focus on splitting parts without modifying the already existing API reducing the risk of breaking things. You should expect such a change in an upcoming patch. steakhal: > [...] get function returns a stored value and the set function sets it. Certainly a burden to…
				steakhalAuthorUnsubmitted Done Reply Inline Actions On second thought, It probably worth having a cleaner API to a slight inconvenience. If he feels like, still can wrap them. I will investigate it tomorrow. steakhal: On second thought, It probably worth having a cleaner API to a slight inconvenience. If he…
				steakhalAuthorUnsubmitted Done Reply Inline Actions I made a separate patch for cleansing this API. In the D84979 now these API functions will behave as expected. steakhal: I made a separate patch for cleansing this API. In the D84979 now these API functions will…
				IgnotusUnsubmitted Not Done Reply Inline Actions I (and other developers) would think that the get function returns a stored value and the set function sets it. Developers should not believe the getters are pure getters. As a checker-writer point of view, you do not care whether the C-string already exist or the checker creates it during symbolic execution, you only want to get the C-string. Think about all the Static Analyzer getters as factory functions, that is the de facto standard now. For example, when you are trying to get a symbolic value with `getSVal()`, for the first occurrence of an expression no `SVal` exist, so it also creates it. With that in mind, @steakhal, could you partially revert the renaming related refactors of D84979, please? Ignotus: > I (and other developers) would think that the get function returns a stored value and the set…
				steakhalAuthorUnsubmitted Done Reply Inline Actions [...] As a checker-writer point of view, you do not care whether the C-string already exist or the checker creates it during symbolic execution, you only want to get the C-string. I would have agreed with you - before I made the D84979 patch. Now I believe if the interface can be implemented purely then it should be done so. Think about all the Static Analyzer getters as factory functions, that is the de facto standard now. We can always change them. For example, when you are trying to get a symbolic value with `getSVal()`, for the first occurrence of an expression no `SVal` exist, so it also creates it. I'm not really familiar with the internals of `getSVal()`, I'm gonna definitely have on that. IMO `getSVal()` is a different beast compared to the functions declared in this header file. With that in mind, @steakhal, could you partially revert the renaming related refactors of D84979, please? I genuinly think that I'm on the right track. If you don't mind, move further discussion about that to the corresponding revision. steakhal: > [...] As a checker-writer point of view, you do not care whether the C-string already exist…

				LLVM_DUMP_METHOD void dumpCStringLengths(ProgramStateRef State,
				raw_ostream &Out = llvm::errs(),
				const char *NL = "\n",
				const char *Sep = " : ");
				} // namespace cstring
				} // namespace ento
				} // namespace clang

				#endif

clang/lib/StaticAnalyzer/Checkers/CStringChecker/CStringLengthModeling.cpp

This file was added.

				//=== CStringLengthModeling.cpp Implementation of CStringLength API C++ -*--=//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Implements the CStringLength API and the CStringChecker bookkeeping parts.
				// Updates the associated cstring lengths of memory regions:
				// - Infers the cstring length of string literals.
				// - Removes cstring length associations of dead symbols.
				// - Handles region invalidation.
				//
				//===----------------------------------------------------------------------===//

				#include "CStringChecker.h"
				#include "CStringLength.h"

				#include "clang/StaticAnalyzer/Core/BugReporter/BugReporter.h"
				#include "clang/StaticAnalyzer/Core/CheckerManager.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
				#include "llvm/Support/raw_ostream.h"

				using namespace clang;
				using namespace ento;
				using namespace cstring;

				/// Associates an strlen to a memory region.
				REGISTER_MAP_WITH_PROGRAMSTATE(CStringLengthMap, const MemRegion *, SVal)

				//===----------------------------------------------------------------------===//
				// Implementation of the public CStringLength API.
				//===----------------------------------------------------------------------===//

				ProgramStateRef cstring::setCStringLength(ProgramStateRef State,
				const MemRegion *MR, SVal StrLength) {
				assert(!StrLength.isUndef() && "Attempt to set an undefined string length");

				MR = MR->StripCasts();

				switch (MR->getKind()) {
				case MemRegion::StringRegionKind:
				// FIXME: This can happen if we strcpy() into a string region. This is
				// undefined [C99 6.4.5p6], but we should still warn about it.
				return State;

				case MemRegion::SymbolicRegionKind:
				case MemRegion::AllocaRegionKind:
				case MemRegion::NonParamVarRegionKind:
				case MemRegion::ParamVarRegionKind:
				case MemRegion::FieldRegionKind:
				case MemRegion::ObjCIvarRegionKind:
				// These are the types we can currently track string lengths for.
				break;

				case MemRegion::ElementRegionKind:
				// FIXME: Handle element regions by upper-bounding the parent region's
				// string length.
				return State;

				default:
				// Other regions (mostly non-data) can't have a reliable C string length.
				// For now, just ignore the change.
				// FIXME: These are rare but not impossible. We should output some kind of
				// warning for things like strcpy((char[]){'a', 0}, "b");
				return State;
				}

				if (StrLength.isUnknown())
				return removeCStringLength(State, MR);
				return State->set<CStringLengthMap>(MR, StrLength);
				}

				ProgramStateRef cstring::removeCStringLength(ProgramStateRef State,
				const MemRegion *MR) {
				return State->remove<CStringLengthMap>(MR);
				}

				static SVal getCStringLengthForRegion(CheckerContext &Ctx,
				ProgramStateRef &State, const Expr *Ex,
				const MemRegion *MR, bool Hypothetical) {
				if (!Hypothetical) {
				// If there's a recorded length, go ahead and return it.
				if (const SVal *Recorded = State->get<CStringLengthMap>(MR))
				return *Recorded;
				}

				// Otherwise, get a new symbol and update the state.
				SValBuilder &SVB = Ctx.getSValBuilder();
				QualType SizeTy = SVB.getContext().getSizeType();
				NonLoc CStrLen =
				SVB.getMetadataSymbolVal(CStringChecker::getTag(), MR, Ex, SizeTy,
				Ctx.getLocationContext(), Ctx.blockCount())
				.castAs<NonLoc>();

				if (!Hypothetical) {
				// In case of unbounded calls strlen etc bound the range to SIZE_MAX/4
				BasicValueFactory &BVF = SVB.getBasicValueFactory();
				const llvm::APSInt &MaxValue = BVF.getMaxValue(SizeTy);
				const llvm::APSInt Four = APSIntType(MaxValue).getValue(4);
				const llvm::APSInt *MaxLength = BVF.evalAPSInt(BO_Div, MaxValue, Four);
				const NonLoc MaxLengthSVal = SVB.makeIntVal(*MaxLength);
				SVal Constrained =
				SVB.evalBinOpNN(State, BO_LE, CStrLen, MaxLengthSVal, SizeTy);
				State = State->assume(Constrained.castAs<DefinedOrUnknownSVal>(), true);
				State = State->set<CStringLengthMap>(MR, CStrLen);
				}

				return CStrLen;
				}

				SVal cstring::getCStringLength(CheckerContext &Ctx, ProgramStateRef &State,
				const Expr *Ex, SVal Buf,
				bool Hypothetical /=false/) {
				if (Buf.isUnknownOrUndef())
				return Buf;

				if (Buf.getAs<loc::GotoLabel>())
				return UndefinedVal();

				// If it's not a region, give up.
				const MemRegion *MR = Buf.getAsRegion();
				if (!MR)
				return UnknownVal();

				// If we have a region, strip casts from it and see if we can figure out
				// its length. For anything we can't figure out, just return UnknownVal.
				MR = MR->StripCasts();

				switch (MR->getKind()) {
				case MemRegion::StringRegionKind: {
				// Modifying the contents of string regions is undefined [C99 6.4.5p6],
				// so we can assume that the byte length is the correct C string length.
				SValBuilder &SVB = Ctx.getSValBuilder();
				QualType SizeTy = SVB.getContext().getSizeType();
				const StringLiteral *StrLiteral =
				cast<StringRegion>(MR)->getStringLiteral();
				return SVB.makeIntVal(StrLiteral->getByteLength(), SizeTy);
				}
				case MemRegion::SymbolicRegionKind:
				case MemRegion::AllocaRegionKind:
				case MemRegion::NonParamVarRegionKind:
				case MemRegion::ParamVarRegionKind:
				case MemRegion::FieldRegionKind:
				case MemRegion::ObjCIvarRegionKind:
				return getCStringLengthForRegion(Ctx, State, Ex, MR, Hypothetical);
				case MemRegion::CompoundLiteralRegionKind:
				// FIXME: Can we track this? Is it necessary?
				return UnknownVal();
				case MemRegion::ElementRegionKind:
				// FIXME: How can we handle this? It's not good enough to subtract the
				// offset from the base string length; consider "123\x00567" and &a[5].
				return UnknownVal();
				default:
				// Other regions (mostly non-data) can't have a reliable C string length.
				return UndefinedVal();
				}
				}

				void cstring::dumpCStringLengths(ProgramStateRef State, raw_ostream &Out,
				const char NL, const char Sep) {
				const CStringLengthMapTy Items = State->get<CStringLengthMap>();
				if (!Items.isEmpty())
				Out << "CString lengths:" << NL;
				for (const auto &Item : Items) {
				Item.first->dumpToStream(Out);
				Out << Sep;
				Item.second.dumpToStream(Out);
				Out << NL;
				}
				}

				//===----------------------------------------------------------------------===//
				// Implementation of the tracking and bookkeeping part of the CStringChecker.
				// Updates the CStringLengthMap.
				// - Infers the cstring length of string literals.
				// - Removes cstring length associations of dead symbols.
				// - Handles region invalidation.
				//===----------------------------------------------------------------------===//

				void *CStringChecker::getTag() {
				static int Tag;
				return &Tag;
				}

				void CStringChecker::checkPreStmt(const DeclStmt *DS, CheckerContext &C) const {
				// Record string length for char a[] = "abc";
				ProgramStateRef state = C.getState();

				for (const auto *I : DS->decls()) {
				const VarDecl *D = dyn_cast<VarDecl>(I);
				if (!D)
				continue;

				// FIXME: Handle array fields of structs.
				if (!D->getType()->isArrayType())
				continue;

				const Expr *Init = D->getInit();
				if (!Init)
				continue;
				if (!isa<StringLiteral>(Init))
				continue;

				Loc VarLoc = state->getLValue(D, C.getLocationContext());
				const MemRegion *MR = VarLoc.getAsRegion();
				if (!MR)
				continue;

				SVal StrVal = C.getSVal(Init);
				assert(StrVal.isValid() && "Initializer string is unknown or undefined");
				DefinedOrUnknownSVal strLength =
				getCStringLength(C, state, Init, StrVal).castAs<DefinedOrUnknownSVal>();

				state = state->set<CStringLengthMap>(MR, strLength);
				}

				C.addTransition(state);
				}

				void CStringChecker::checkLiveSymbols(ProgramStateRef State,
				SymbolReaper &SR) const {
				// Mark all symbols in our string length map as valid.
				for (const auto &Item : State->get<CStringLengthMap>()) {
				SVal Len = Item.second;
				const auto LenSymbolRange =
				llvm::make_range(Len.symbol_begin(), Len.symbol_end());
				for (SymbolRef Symbol : LenSymbolRange)
				SR.markInUse(Symbol);
				}
				}

				void CStringChecker::checkDeadSymbols(SymbolReaper &SR,
				CheckerContext &C) const {
				ProgramStateRef State = C.getState();
				CStringLengthMapTy Entries = State->get<CStringLengthMap>();
				if (Entries.isEmpty())
				return;

				CStringLengthMapTy::Factory &F = State->get_context<CStringLengthMap>();
				for (CStringLengthMapTy::iterator I = Entries.begin(), E = Entries.end();
				I != E; ++I) {
				SVal Len = I.getData();
				if (SymbolRef Sym = Len.getAsSymbol()) {
				if (SR.isDead(Sym))
				Entries = F.remove(Entries, I.getKey());
				}
				}

				State = State->set<CStringLengthMap>(Entries);
				C.addTransition(State);
				}

				ProgramStateRef CStringChecker::checkRegionChanges(
				ProgramStateRef state, const InvalidatedSymbols *,
				ArrayRef<const MemRegion *> ExplicitRegions,
				ArrayRef<const MemRegion > Regions, const LocationContext ,
				const CallEvent *) const {
				CStringLengthMapTy Entries = state->get<CStringLengthMap>();
				if (Entries.isEmpty())
				return state;

				llvm::SmallPtrSet<const MemRegion *, 8> Invalidated;
				llvm::SmallPtrSet<const MemRegion *, 32> SuperRegions;

				// First build sets for the changed regions and their super-regions.
				for (ArrayRef<const MemRegion *>::iterator I = Regions.begin(),
				E = Regions.end();
				I != E; ++I) {
				const MemRegion MR = I;
				Invalidated.insert(MR);

				SuperRegions.insert(MR);
				while (const SubRegion *SR = dyn_cast<SubRegion>(MR)) {
				MR = SR->getSuperRegion();
				SuperRegions.insert(MR);
				}
				}

				CStringLengthMapTy::Factory &F = state->get_context<CStringLengthMap>();

				// Then loop over the entries in the current state.
				for (CStringLengthMapTy::iterator I = Entries.begin(), E = Entries.end();
				I != E; ++I) {
				const MemRegion *MR = I.getKey();

				// Is this entry for a super-region of a changed region?
				if (SuperRegions.count(MR)) {
				Entries = F.remove(Entries, MR);
				continue;
				}

				// Is this entry for a sub-region of a changed region?
				const MemRegion *Super = MR;
				while (const SubRegion *SR = dyn_cast<SubRegion>(Super)) {
				Super = SR->getSuperRegion();
				if (Invalidated.count(Super)) {
				Entries = F.remove(Entries, MR);
				break;
				}
				}
				}

				return state->set<CStringLengthMap>(Entries);
				}

				// TODO: Is it useful?
				NoQUnsubmitted Not Done Reply Inline Actions Yes it is. It gets invoked during exploded graph dumps and it's an invaluable debugging facility. NoQ: Yes it is. It gets invoked during exploded graph dumps and it's an invaluable debugging…
				steakhalAuthorUnsubmitted Done Reply Inline Actions A strange observation to note here. In the implementation of the dump method, I use the provided `NL` and `Sep` parameters. However, in the `checker_messages` of a State dump, `Sep` seem to be substituted with the empty string. For example taint::printTaint just ignore the `Sep` parameter to possibly workaround this issue. steakhal: A strange observation to note here. In the implementation of the dump method, I use the…
				void CStringChecker::printState(raw_ostream &Out, ProgramStateRef State,
				const char NL, const char Sep) const {
				dumpCStringLengths(State, Out, NL, Sep);
				}