This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
include/clang/StaticAnalyzer/Core/PathSensitive/
-
clang/
-
StaticAnalyzer/
-
Core/
-
PathSensitive/
-
ProgramState.h
1
TaintManager.h
-
lib/StaticAnalyzer/
-
StaticAnalyzer/
-
Checkers/
1/5
GenericTaintChecker.cpp
-
Core/
7/16
ProgramState.cpp
-
RegionStore.cpp
-
test/Analysis/
-
Analysis/
1/3
taint-generic.c

Differential D30909

[Analyzer] Finish taint propagation to derived symbols of tainted regions
ClosedPublic

Authored by vlad.tsyrklevich on Mar 13 2017, 2:12 PM.

Download Raw Diff

Details

Reviewers

zaks.anna
cfe-commits
NoQ

Summary

This is the second part of https://reviews.llvm.org/D28445, it extends taint propagation to cases where the tainted region is a sub-region and we can't taint a conjured symbol entirely. This required adding a new map in the GDM that maps tainted parent symbols to tainted sub-regions (in order to avoid a linear scan looking for appropriate symbols in the current TaintMap.) With this change, tainting of structs and unions should work as expected.

Diff Detail

Event Timeline

vlad.tsyrklevich created this revision.Mar 13 2017, 2:12 PM

Fix a stray assert()

Fix a stray assert(), correctly this time..

Thanks again for the awesome stuff! It took years for me to even come up with the idea.

lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
477	I'd think about this a bit more and come back. I need to understand how come that constructing a symbol manually is the right thing to do; that doesn't happen very often, but it seems correct here.
lib/StaticAnalyzer/Core/ProgramState.cpp
696	Just see if this pointer is null instead of a separate `contains<>` check?
797	Just see if this pointer is null instead of a separate `contains<>` check?
802	This could be made even stronger when there are multiple ways of constructing the same sub-region. For instance, union { int x; char y[4]; } u; u.x = taint(); u.y[0]; // is tainted? To handle such cases, we could try to see if byte offsets are nested, instead of checking `isSubRegionOf()`. I suggest adding a TODO (and maybe a FIXME-test), because it gets more and more complicated. Especially with symbolic offsets.
test/Analysis/taint-generic.c
205	Are we already supporting the case when we're tainting some elements of an array but not all of them, and this works as expected? Could we add such tests (regardless of whether we already handle them)?

Respond to Artem's comments.

lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
477	Indeed it is odd. The best justification I could come up with: LCVs are meant to be optimizations, their 'purpose' is to expose an SVal that hides SymbolRef values so that we can have a split store. We don't have to copy all of a compound values SymbolRef mappings because LCVs are kept distinct. Hence to set/query/constrain region values you use SVals so that RegionStore can differentiate between LCVs and SymbolRef backed SVals for the two different stores it contains. The taint interface however requires you taint a SymbolRef, not an SVal. If we wanted, instead of doing this logic here, we could change getPointedToSymbol() to return an SVal and update usages of it accordingly since that value is only passed on to ProgramState.isTainted()/ProgramState.addTaint() anyway. Then we could update addTaint/isTainted to perform this logic, hiding it from the checker. This still requires manually constructing a symbol, now it's just performed in the analyzer instead of in a checker. Not sure if that addresses the issue you were considering, but the idea that we need to 'undo' the LCV optimization hiding the SymbolRef to have a value to taint seems somewhat convincing to me. What do you think?
test/Analysis/taint-generic.c
205	It does work in that case. If you taint element X of region Y the current logic will be conservative and only mark element X as tainted, not X-i or X+i. This is also true for element 0, so if a programmer passes &array[0] but reads sizeof(array) bytes it will not correctly mark that. This is also a short coming of the invalidation code so I don't think there's much to do until there's more general support for dealing with region extents.

NoQ added inline comments.Apr 5 2017, 5:30 AM

lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
477	Hmm (!) I suggest adding a new function to the program state, that we'd call `addPartialTaint()` or something like that, and this function would accept a symbol and a region and would act identically to passing a derived symbol (from this symbol and that region) to `addTaint()` (but we wouldn't need to actually construct a derived symbol here). Such API would be easier to understand and use than the current approach that forces the user to construct a derived symbol manually in the checker code. Unfortunately, this checker's `getLCVSymbol()` would become a bit more complicated (having various return types depending on circumstances), but this misfortune seems more random than systematic to me. Since we're having this new kind of partial taint, why don't we expose it in the API.
test/Analysis/taint-generic.c
205	\o/

vlad.tsyrklevich marked 2 inline comments as done.Apr 8 2017, 3:09 PM

vlad.tsyrklevich added inline comments.

lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
477	I'm happy to implement it this way, but figured I'd ask why you prefer this approach first in the interest of keeping the TaintChecker simple! The optimal approach to me seems to be changing `getPointedToSymbol()` to `getPointedToSVal()` and having `addTaint(SVal)` call `addPartialTaint()` when it's passed an LCV sub-region. That way users of the taint interface like the TaintChecker have a clean way to add & check regardless of whether it's a SymbolRef or an LCV but the partial taint functionality is still exposed and documented for those who might want to use it in new ways. Just curious to understand your rationale. Thanks for the feedback!

danielmarjamaki added a subscriber: danielmarjamaki.Apr 12 2017, 4:38 AM

danielmarjamaki added inline comments.

include/clang/StaticAnalyzer/Core/PathSensitive/TaintManager.h
45	Nit: =0 is redundant

NoQ added inline comments.Apr 18 2017, 6:55 AM

lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
477	Your idea actually looks good to me! I'd approve going this way. With this change to `addTaint(SVal)`, i suspect it'd need some extra documentation to explain what it does now.
lib/StaticAnalyzer/Core/ProgramState.cpp
694	The `SymRegions` name is a bit confusing because we often shorten `SymbolicRegion` to `SymRegion` (eg. in dumps), which is not what we mean.
700	I wonder if it's worth it to check if a super-region of this region is already tainted, and avoid adding the region in this scenario. I guess in practice it won't happen very often, because this code would most likely be executed just once per taint source. This probably deserves a comment though.

Update the logic to move the LCV symbol logic into ProgramState::addTaint(SVal) out of the GenericTaintChecker. This allows us to no longer have to synthesize a new SymbolDerived from a LazyCompoundVal. This also required adding a new addPartialTaint() function.
Update TaintedSymRegions name to TaintedSubRegions per @NoQ's comment.
I realized that the new partial taint logic did not respect the idea of TaintTagTypes, so I updated TaintSubRegion to include a TaintTagType and added appropriate logic to add/check them.

Herald added a subscriber: xazax.hun. · View Herald TranscriptMay 9 2017, 8:29 PM

This is fantastic, thanks! I really like the shape of how it turned out to work.

Minor stuff and we're landing :)

lib/StaticAnalyzer/Core/ProgramState.cpp
651	Whitespace seems a bit off.
656–671	I still feel bad about producing API with very tricky pre-conditions. The LCV may have various forms - some may have empty store with no symbols at all, and others may be full of direct bindings that make the symbol essentially irrelevant. However, because the taint API is designed to be defensive to cases when taint cannot be added, and it sounds like a good thing, i guess we've taken the right approach here :) I suggest commenting this more thoroughly though, something like: If the SVal represents a structure, try to mass-taint all values within the structure. For now it only works efficiently on lazy compound values that were conjured during a conservative evaluation of a function - either as return values of functions that return structures or arrays by value, or as values of structures or arrays passed into the function by reference, directly or through pointer aliasing. Such lazy compound values are characterized by having exactly one binding in their captured store within their parent region, which is a conjured symbol default-bound to the base region of the parent region. Then inside `if (Sym)`: If the parent region is a base region, we add taint to the whole conjured symbol. Otherwise, when the value represents a record-typed field within the conjured structure, so we add partial taint for that symbol to that field.
698–700	Speaking of taint tags: right now we didn't add support for multiple taint tags per symbol (because we have only one taint tag to choose from), but `addTaint` overwrites the tag. I guess we should do the same for now.
712	Can we assert that the returned state is not empty, like in `addTaint`?

Minor updates & some clarification on the feedback

lib/StaticAnalyzer/Core/ProgramState.cpp
656–671	The pre-conditions for using the API are actually a bit simpler than what's exposed here. I made it explicit to make the logic for tainting LCVs explicit, but the following simpler logic works: if (auto LCV = V.getAs<nonloc::LazyCompoundVal>()) { if (Optional<SVal> binding = getStateManager().StoreMgr->getDefaultBinding(*LCV)) { if (SymbolRef Sym = binding->getAsSymbol()) { return addPartialTaint(Sym, LCV->getRegion(), Kind); } } } This works because `addPartialTaint()` actually already performs the 'getRegion() == getRegion()->getBaseRegion()` check already and taints the parent symbol if the region is over the base region already. I chose to replicate it here to make the logic more explicit, but now that I've written this down the overhead of duplicating the logic seems unnecessary. Do you agree?
698–700	I believe this is the current behavior. On line 714 I presume ImmutableMap::add overrides a key if it's already present in the map but I couldn't trace down the Tree ADT implementation to confirm this.

NoQ added inline comments.May 12 2017, 2:22 AM

lib/StaticAnalyzer/Core/ProgramState.cpp
656–671	The pre-conditions for using the API are actually a bit simpler than what's exposed here. I'm talking about the situation when we add the partial taint to the default-bound symbol but it has no effect because there's a direct binding in the lazy compound value on top of it. The user should ideally understand why it doesn't work that way. I chose to replicate it here to make the logic more explicit, but now that I've written this down the overhead of duplicating the logic seems unnecessary. Do you agree? The variant without the check looks easier to read, and it is kind of obvious that full taint is a special case of partial taint, so i'm for removing the check.
698–700	I presume ImmutableMap::add overrides a key if it's already present in the map Yep, it does. I believe this is the current behavior. No, you early-return the original state if the full-taint map already contains the info for the whole symbol on line 703. Hmm. In fact, with my suggestion we'd be able to have full taint of one kind and partial taint of another kind. I guess it's all right.
807	I actually have no idea why does this function accumulate things in the `Tainted` variable, instead of returning :)

Some stylistic & comment updates.

lib/StaticAnalyzer/Core/ProgramState.cpp
656–671	when we add the partial taint to the default-bound symbol but it has no effect because there's a direct binding in the lazy compound value on top of it Ah, so you're talking about the case where the LCV encompasses a value with both direct & default bindings, e.g. `int foo[1024]; foo[123] = rand(); taint(foo);`? In that case we will miss tainting the `rand` SymbolConjured. I suppose we could scan the region store for matching entries? In practice I think we're really only interested in tainting default bindings anyway for some unknown network/user input anyway. Anyways, your comment makes more sense now. I've added it.
807	It made a little more stylistic sense before this change, but not so much now. I've updated them to return immediately.

I'll land this. Thanks again for working on all that stuff!

This revision is now accepted and ready to land.May 29 2017, 7:46 AM

Uhm, messed up the phabricator link in rL304162, which should have been pointing here but points to D28445 instead.

We've broken something:

http://lab.llvm.org:8011/builders/sanitizer-x86_64-linux-fast/builds/5288/steps/check-clang%20asan/logs/stdio

I hope i fixed it in rL304170.

Yep, fixed indeed.

Revision Contents

Path

Size

include/

clang/

StaticAnalyzer/

Core/

PathSensitive/

ProgramState.h

15 lines

TaintManager.h

10 lines

lib/

StaticAnalyzer/

Checkers/

GenericTaintChecker.cpp

83 lines

Core/

ProgramState.cpp

103 lines

RegionStore.cpp

5 lines

test/

Analysis/

taint-generic.c

41 lines

Diff 98905

include/clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h

Show All 37 Lines

class CallEvent;		class CallEvent;
class CallEventManager;		class CallEventManager;

typedef std::unique_ptr<ConstraintManager>(*ConstraintManagerCreator)(		typedef std::unique_ptr<ConstraintManager>(*ConstraintManagerCreator)(
ProgramStateManager &, SubEngine *);		ProgramStateManager &, SubEngine *);
typedef std::unique_ptr<StoreManager>(*StoreManagerCreator)(		typedef std::unique_ptr<StoreManager>(*StoreManagerCreator)(
ProgramStateManager &);		ProgramStateManager &);
		typedef llvm::ImmutableMap<const SubRegion*, TaintTagType> TaintedSubRegions;
		typedef llvm::ImmutableMapRef<const SubRegion*, TaintTagType>
		TaintedSubRegionsRef;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// ProgramStateTrait - Traits used by the Generic Data Map of a ProgramState.		// ProgramStateTrait - Traits used by the Generic Data Map of a ProgramState.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

template <typename T> struct ProgramStatePartialTrait;		template <typename T> struct ProgramStatePartialTrait;

template <typename T> struct ProgramStateTrait {		template <typename T> struct ProgramStateTrait {
Show All 28 Lines	private:
friend class ExplodedGraph;		friend class ExplodedGraph;
friend class ExplodedNode;		friend class ExplodedNode;

ProgramStateManager *stateMgr;		ProgramStateManager *stateMgr;
Environment Env; // Maps a Stmt to its current SVal.		Environment Env; // Maps a Stmt to its current SVal.
Store store; // Maps a location to its current value.		Store store; // Maps a location to its current value.
GenericDataMap GDM; // Custom data stored by a client of this class.		GenericDataMap GDM; // Custom data stored by a client of this class.
unsigned refCount;		unsigned refCount;
		TaintedSubRegions::Factory TSRFactory;

/// makeWithStore - Return a ProgramState with the same values as the current		/// makeWithStore - Return a ProgramState with the same values as the current
/// state with the exception of using the specified Store.		/// state with the exception of using the specified Store.
ProgramStateRef makeWithStore(const StoreRef &store) const;		ProgramStateRef makeWithStore(const StoreRef &store) const;

void setStore(const StoreRef &storeRef);		void setStore(const StoreRef &storeRef);

public:		public:
▲ Show 20 Lines • Show All 240 Lines • ▼ Show 20 Lines	public:
template <typename CB> CB		template <typename CB> CB
scanReachableSymbols(const MemRegion * const *beg,		scanReachableSymbols(const MemRegion * const *beg,
const MemRegion * const *end) const;		const MemRegion * const *end) const;

/// Create a new state in which the statement is marked as tainted.		/// Create a new state in which the statement is marked as tainted.
ProgramStateRef addTaint(const Stmt S, const LocationContext LCtx,		ProgramStateRef addTaint(const Stmt S, const LocationContext LCtx,
TaintTagType Kind = TaintTagGeneric) const;		TaintTagType Kind = TaintTagGeneric) const;

		/// Create a new state in which the value is marked as tainted.
		ProgramStateRef addTaint(SVal V, TaintTagType Kind = TaintTagGeneric) const;

/// Create a new state in which the symbol is marked as tainted.		/// Create a new state in which the symbol is marked as tainted.
ProgramStateRef addTaint(SymbolRef S,		ProgramStateRef addTaint(SymbolRef S,
TaintTagType Kind = TaintTagGeneric) const;		TaintTagType Kind = TaintTagGeneric) const;

/// Create a new state in which the region symbol is marked as tainted.		/// Create a new state in which the region symbol is marked as tainted.
ProgramStateRef addTaint(const MemRegion *R,		ProgramStateRef addTaint(const MemRegion *R,
TaintTagType Kind = TaintTagGeneric) const;		TaintTagType Kind = TaintTagGeneric) const;

		/// Create a new state in a which a sub-region of a given symbol is tainted.
		/// This might be necessary when referring to regions that can not have an
		/// individual symbol, e.g. if they are represented by the default binding of
		/// a LazyCompoundVal.
		ProgramStateRef addPartialTaint(SymbolRef ParentSym,
		const SubRegion *SubRegion,
		TaintTagType Kind = TaintTagGeneric) const;

/// Check if the statement is tainted in the current state.		/// Check if the statement is tainted in the current state.
bool isTainted(const Stmt S, const LocationContext LCtx,		bool isTainted(const Stmt S, const LocationContext LCtx,
TaintTagType Kind = TaintTagGeneric) const;		TaintTagType Kind = TaintTagGeneric) const;
bool isTainted(SVal V, TaintTagType Kind = TaintTagGeneric) const;		bool isTainted(SVal V, TaintTagType Kind = TaintTagGeneric) const;
bool isTainted(SymbolRef Sym, TaintTagType Kind = TaintTagGeneric) const;		bool isTainted(SymbolRef Sym, TaintTagType Kind = TaintTagGeneric) const;
bool isTainted(const MemRegion *Reg, TaintTagType Kind=TaintTagGeneric) const;		bool isTainted(const MemRegion *Reg, TaintTagType Kind=TaintTagGeneric) const;

//==---------------------------------------------------------------------==//		//==---------------------------------------------------------------------==//
▲ Show 20 Lines • Show All 491 Lines • Show Last 20 Lines

include/clang/StaticAnalyzer/Core/PathSensitive/TaintManager.h

	Show All 29 Lines
	// from multiple translation units.			// from multiple translation units.
	struct TaintMap {};			struct TaintMap {};
	typedef llvm::ImmutableMap<SymbolRef, TaintTagType> TaintMapImpl;			typedef llvm::ImmutableMap<SymbolRef, TaintTagType> TaintMapImpl;
	template<> struct ProgramStateTrait<TaintMap>			template<> struct ProgramStateTrait<TaintMap>
	: public ProgramStatePartialTrait<TaintMapImpl> {			: public ProgramStatePartialTrait<TaintMapImpl> {
	static void *GDMIndex() { static int index = 0; return &index; }			static void *GDMIndex() { static int index = 0; return &index; }
	};			};

				/// The GDM component mapping derived symbols' parent symbols to their
				/// underlying regions. This is used to efficiently check whether a symbol is
				/// tainted when it represents a sub-region of a tainted symbol.
				struct DerivedSymTaint {};
				typedef llvm::ImmutableMap<SymbolRef, TaintedSubRegionsRef> DerivedSymTaintImpl;
				template<> struct ProgramStateTrait<DerivedSymTaint>
				: public ProgramStatePartialTrait<DerivedSymTaintImpl> {
				static void *GDMIndex() { static int index; return &index; }
				danielmarjamakiUnsubmitted Not Done Reply Inline Actions Nit: =0 is redundant danielmarjamaki: Nit: =0 is redundant
				};

	class TaintManager {			class TaintManager {

	TaintManager() {}			TaintManager() {}
	};			};

	}			}
	}			}

	#endif			#endif

lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp

Show First 20 Lines • Show All 59 Lines • ▼ Show 20 Lines	private:

/// \brief Add taint sources on a post visit.		/// \brief Add taint sources on a post visit.
void addSourcesPost(const CallExpr *CE, CheckerContext &C) const;		void addSourcesPost(const CallExpr *CE, CheckerContext &C) const;

/// Check if the region the expression evaluates to is the standard input,		/// Check if the region the expression evaluates to is the standard input,
/// and thus, is tainted.		/// and thus, is tainted.
static bool isStdin(const Expr *E, CheckerContext &C);		static bool isStdin(const Expr *E, CheckerContext &C);

/// This is called from getPointedToSymbol() to resolve symbol references for		/// \brief Given a pointer argument, return the value it points to.
/// the region underlying a LazyCompoundVal. This is the default binding		static Optional<SVal> getPointedToSVal(CheckerContext &C, const Expr *Arg);
/// for the LCV, which could be a conjured symbol from a function call that
/// initialized the region. It only returns the conjured symbol if the LCV
/// covers the entire region, e.g. we avoid false positives by not returning
/// a default bindingc for an entire struct if the symbol for only a single
/// field or element within it is requested.
// TODO: Return an appropriate symbol for sub-fields/elements of an LCV so
// that they are also appropriately tainted.
static SymbolRef getLCVSymbol(CheckerContext &C,
nonloc::LazyCompoundVal &LCV);

/// \brief Given a pointer argument, get the symbol of the value it contains
/// (points to).
static SymbolRef getPointedToSymbol(CheckerContext &C, const Expr *Arg);

/// Functions defining the attack surface.		/// Functions defining the attack surface.
typedef ProgramStateRef (GenericTaintChecker::FnCheck)(const CallExpr ,		typedef ProgramStateRef (GenericTaintChecker::FnCheck)(const CallExpr ,
CheckerContext &C) const;		CheckerContext &C) const;
ProgramStateRef postScanf(const CallExpr *CE, CheckerContext &C) const;		ProgramStateRef postScanf(const CallExpr *CE, CheckerContext &C) const;
ProgramStateRef postSocket(const CallExpr *CE, CheckerContext &C) const;		ProgramStateRef postSocket(const CallExpr *CE, CheckerContext &C) const;
ProgramStateRef postRetTaint(const CallExpr *CE, CheckerContext &C) const;		ProgramStateRef postRetTaint(const CallExpr *CE, CheckerContext &C) const;

▲ Show 20 Lines • Show All 90 Lines • ▼ Show 20 Lines	struct TaintPropagationRule {
inline bool isDestinationArgument(unsigned ArgNum) const {		inline bool isDestinationArgument(unsigned ArgNum) const {
return (std::find(DstArgs.begin(),		return (std::find(DstArgs.begin(),
DstArgs.end(), ArgNum) != DstArgs.end());		DstArgs.end(), ArgNum) != DstArgs.end());
}		}

static inline bool isTaintedOrPointsToTainted(const Expr *E,		static inline bool isTaintedOrPointsToTainted(const Expr *E,
ProgramStateRef State,		ProgramStateRef State,
CheckerContext &C) {		CheckerContext &C) {
return (State->isTainted(E, C.getLocationContext()) \|\| isStdin(E, C) \|\|		if (State->isTainted(E, C.getLocationContext()) \|\| isStdin(E, C))
(E->getType().getTypePtr()->isPointerType() &&		return true;
State->isTainted(getPointedToSymbol(C, E))));
		if (!E->getType().getTypePtr()->isPointerType())
		return false;

		Optional<SVal> V = getPointedToSVal(C, E);
		return (V && State->isTainted(*V));
}		}

/// \brief Pre-process a function which propagates taint according to the		/// \brief Pre-process a function which propagates taint according to the
/// taint rule.		/// taint rule.
ProgramStateRef process(const CallExpr *CE, CheckerContext &C) const;		ProgramStateRef process(const CallExpr *CE, CheckerContext &C) const;

};		};
};		};
▲ Show 20 Lines • Show All 195 Lines • ▼ Show 20 Lines	if (ArgNum == ReturnValueIndex) {
continue;		continue;
}		}

// The arguments are pointer arguments. The data they are pointing at is		// The arguments are pointer arguments. The data they are pointing at is
// tainted after the call.		// tainted after the call.
if (CE->getNumArgs() < (ArgNum + 1))		if (CE->getNumArgs() < (ArgNum + 1))
return false;		return false;
const Expr* Arg = CE->getArg(ArgNum);		const Expr* Arg = CE->getArg(ArgNum);
SymbolRef Sym = getPointedToSymbol(C, Arg);		Optional<SVal> V = getPointedToSVal(C, Arg);
if (Sym)		if (V)
State = State->addTaint(Sym);		State = State->addTaint(*V);
}		}

// Clear up the taint info from the state.		// Clear up the taint info from the state.
State = State->remove<TaintArgsOnPostVisit>();		State = State->remove<TaintArgsOnPostVisit>();

if (State != C.getState()) {		if (State != C.getState()) {
C.addTransition(State);		C.addTransition(State);
return true;		return true;
▲ Show 20 Lines • Show All 54 Lines • ▼ Show 20 Lines	if (checkSystemCall(CE, Name, C))
return true;		return true;

if (checkTaintedBufferSize(CE, FDecl, C))		if (checkTaintedBufferSize(CE, FDecl, C))
return true;		return true;

return false;		return false;
}		}

SymbolRef GenericTaintChecker::getLCVSymbol(CheckerContext &C,		Optional<SVal> GenericTaintChecker::getPointedToSVal(CheckerContext &C,
nonloc::LazyCompoundVal &LCV) {
StoreManager &StoreMgr = C.getStoreManager();

// getLCVSymbol() is reached in a PostStmt so we can always expect a default
// binding to exist if one is present.
if (Optional<SVal> binding = StoreMgr.getDefaultBinding(LCV)) {
SymbolRef Sym = binding->getAsSymbol();
if (!Sym)
return nullptr;

// If the LCV covers an entire base region return the default conjured symbol.
if (LCV.getRegion() == LCV.getRegion()->getBaseRegion())
return Sym;
}

// Otherwise, return a nullptr as there's not yet a functional way to taint
// sub-regions of LCVs.
return nullptr;
}

SymbolRef GenericTaintChecker::getPointedToSymbol(CheckerContext &C,
const Expr* Arg) {		const Expr* Arg) {
ProgramStateRef State = C.getState();		ProgramStateRef State = C.getState();
SVal AddrVal = State->getSVal(Arg->IgnoreParens(), C.getLocationContext());		SVal AddrVal = State->getSVal(Arg->IgnoreParens(), C.getLocationContext());
if (AddrVal.isUnknownOrUndef())		if (AddrVal.isUnknownOrUndef())
return nullptr;		return None;

Optional<Loc> AddrLoc = AddrVal.getAs<Loc>();		Optional<Loc> AddrLoc = AddrVal.getAs<Loc>();
if (!AddrLoc)		if (!AddrLoc)
return nullptr;		return None;
		NoQUnsubmitted Not Done Reply Inline Actions I'd think about this a bit more and come back. I need to understand how come that constructing a symbol manually is the right thing to do; that doesn't happen very often, but it seems correct here. NoQ: I'd think about this a bit more and come back. I need to understand how come that constructing…
		vlad.tsyrklevichAuthorUnsubmitted Not Done Reply Inline Actions Indeed it is odd. The best justification I could come up with: LCVs are meant to be optimizations, their 'purpose' is to expose an SVal that hides SymbolRef values so that we can have a split store. We don't have to copy all of a compound values SymbolRef mappings because LCVs are kept distinct. Hence to set/query/constrain region values you use SVals so that RegionStore can differentiate between LCVs and SymbolRef backed SVals for the two different stores it contains. The taint interface however requires you taint a SymbolRef, not an SVal. If we wanted, instead of doing this logic here, we could change getPointedToSymbol() to return an SVal and update usages of it accordingly since that value is only passed on to ProgramState.isTainted()/ProgramState.addTaint() anyway. Then we could update addTaint/isTainted to perform this logic, hiding it from the checker. This still requires manually constructing a symbol, now it's just performed in the analyzer instead of in a checker. Not sure if that addresses the issue you were considering, but the idea that we need to 'undo' the LCV optimization hiding the SymbolRef to have a value to taint seems somewhat convincing to me. What do you think? vlad.tsyrklevich: Indeed it is odd. The best justification I could come up with: LCVs are meant to be…
		NoQUnsubmitted Not Done Reply Inline Actions Hmm (!) I suggest adding a new function to the program state, that we'd call `addPartialTaint()` or something like that, and this function would accept a symbol and a region and would act identically to passing a derived symbol (from this symbol and that region) to `addTaint()` (but we wouldn't need to actually construct a derived symbol here). Such API would be easier to understand and use than the current approach that forces the user to construct a derived symbol manually in the checker code. Unfortunately, this checker's `getLCVSymbol()` would become a bit more complicated (having various return types depending on circumstances), but this misfortune seems more random than systematic to me. Since we're having this new kind of partial taint, why don't we expose it in the API. NoQ: Hmm (!) I suggest adding a new function to the program state, that we'd call `addPartialTaint…
		vlad.tsyrklevichAuthorUnsubmitted Not Done Reply Inline Actions I'm happy to implement it this way, but figured I'd ask why you prefer this approach first in the interest of keeping the TaintChecker simple! The optimal approach to me seems to be changing `getPointedToSymbol()` to `getPointedToSVal()` and having `addTaint(SVal)` call `addPartialTaint()` when it's passed an LCV sub-region. That way users of the taint interface like the TaintChecker have a clean way to add & check regardless of whether it's a SymbolRef or an LCV but the partial taint functionality is still exposed and documented for those who might want to use it in new ways. Just curious to understand your rationale. Thanks for the feedback! vlad.tsyrklevich: I'm happy to implement it this way, but figured I'd ask why you prefer this approach first in…
		NoQUnsubmitted Done Reply Inline Actions Your idea actually looks good to me! I'd approve going this way. With this change to `addTaint(SVal)`, i suspect it'd need some extra documentation to explain what it does now. NoQ: Your idea actually looks good to me! I'd approve going this way. With this change to `addTaint…

const PointerType *ArgTy =		const PointerType *ArgTy =
dyn_cast<PointerType>(Arg->getType().getCanonicalType().getTypePtr());		dyn_cast<PointerType>(Arg->getType().getCanonicalType().getTypePtr());
SVal Val = State->getSVal(*AddrLoc,		return State->getSVal(*AddrLoc, ArgTy ? ArgTy->getPointeeType(): QualType());
ArgTy ? ArgTy->getPointeeType(): QualType());

if (auto LCV = Val.getAs<nonloc::LazyCompoundVal>())
return getLCVSymbol(C, *LCV);

return Val.getAsSymbol();
}		}

ProgramStateRef		ProgramStateRef
GenericTaintChecker::TaintPropagationRule::process(const CallExpr *CE,		GenericTaintChecker::TaintPropagationRule::process(const CallExpr *CE,
CheckerContext &C) const {		CheckerContext &C) const {
ProgramStateRef State = C.getState();		ProgramStateRef State = C.getState();

// Check for taint in arguments.		// Check for taint in arguments.
▲ Show 20 Lines • Show All 103 Lines • ▼ Show 20 Lines	ProgramStateRef GenericTaintChecker::postScanf(const CallExpr *CE,
if (CE->getNumArgs() < 2)		if (CE->getNumArgs() < 2)
return State;		return State;

// All arguments except for the very first one should get taint.		// All arguments except for the very first one should get taint.
for (unsigned int i = 1; i < CE->getNumArgs(); ++i) {		for (unsigned int i = 1; i < CE->getNumArgs(); ++i) {
// The arguments are pointer arguments. The data they are pointing at is		// The arguments are pointer arguments. The data they are pointing at is
// tainted after the call.		// tainted after the call.
const Expr* Arg = CE->getArg(i);		const Expr* Arg = CE->getArg(i);
SymbolRef Sym = getPointedToSymbol(C, Arg);		Optional<SVal> V = getPointedToSVal(C, Arg);
if (Sym)		if (V)
State = State->addTaint(Sym);		State = State->addTaint(*V);
}		}
return State;		return State;
}		}

ProgramStateRef GenericTaintChecker::postRetTaint(const CallExpr *CE,		ProgramStateRef GenericTaintChecker::postRetTaint(const CallExpr *CE,
CheckerContext &C) const {		CheckerContext &C) const {
return C.getState()->addTaint(CE, C.getLocationContext());		return C.getState()->addTaint(CE, C.getLocationContext());
}		}
▲ Show 20 Lines • Show All 58 Lines • ▼ Show 20 Lines

bool GenericTaintChecker::generateReportIfTainted(const Expr *E,		bool GenericTaintChecker::generateReportIfTainted(const Expr *E,
const char Msg[],		const char Msg[],
CheckerContext &C) const {		CheckerContext &C) const {
assert(E);		assert(E);

// Check for taint.		// Check for taint.
ProgramStateRef State = C.getState();		ProgramStateRef State = C.getState();
const SymbolRef PointedToSym = getPointedToSymbol(C, E);		Optional<SVal> PointedToSVal = getPointedToSVal(C, E);
SVal TaintedSVal;		SVal TaintedSVal;
if (State->isTainted(PointedToSym))		if (PointedToSVal && State->isTainted(*PointedToSVal))
TaintedSVal = nonloc::SymbolVal(PointedToSym);		TaintedSVal = *PointedToSVal;
else if (State->isTainted(E, C.getLocationContext()))		else if (State->isTainted(E, C.getLocationContext()))
TaintedSVal = C.getSVal(E);		TaintedSVal = C.getSVal(E);
else		else
return false;		return false;

// Generate diagnostic.		// Generate diagnostic.
if (ExplodedNode *N = C.generateNonFatalErrorNode()) {		if (ExplodedNode *N = C.generateNonFatalErrorNode()) {
initBugType();		initBugType();
▲ Show 20 Lines • Show All 88 Lines • Show Last 20 Lines

lib/StaticAnalyzer/Core/ProgramState.cpp

Show First 20 Lines • Show All 638 Lines • ▼ Show 20 Lines
}		}

ProgramStateRef ProgramState::addTaint(const Stmt *S,		ProgramStateRef ProgramState::addTaint(const Stmt *S,
const LocationContext *LCtx,		const LocationContext *LCtx,
TaintTagType Kind) const {		TaintTagType Kind) const {
if (const Expr *E = dyn_cast_or_null<Expr>(S))		if (const Expr *E = dyn_cast_or_null<Expr>(S))
S = E->IgnoreParens();		S = E->IgnoreParens();

SymbolRef Sym = getSVal(S, LCtx).getAsSymbol();		return addTaint(getSVal(S, LCtx), Kind);
		}

		ProgramStateRef ProgramState::addTaint(SVal V,
		TaintTagType Kind) const {
		NoQUnsubmitted Done Reply Inline Actions Whitespace seems a bit off. NoQ: Whitespace seems a bit off.
		SymbolRef Sym = V.getAsSymbol();
if (Sym)		if (Sym)
return addTaint(Sym, Kind);		return addTaint(Sym, Kind);

const MemRegion *R = getSVal(S, LCtx).getAsRegion();		// If the SVal represents a structure, try to mass-taint all values within the
addTaint(R, Kind);		// structure. For now it only works efficiently on lazy compound values that
		// were conjured during a conservative evaluation of a function - either as
		// return values of functions that return structures or arrays by value, or as
		// values of structures or arrays passed into the function by reference,
		// directly or through pointer aliasing. Such lazy compound values are
		// characterized by having exactly one binding in their captured store within
		// their parent region, which is a conjured symbol default-bound to the base
		// region of the parent region.
		if (auto LCV = V.getAs<nonloc::LazyCompoundVal>()) {
		if (Optional<SVal> binding = getStateManager().StoreMgr->getDefaultBinding(*LCV)) {
		if (SymbolRef Sym = binding->getAsSymbol())
		return addPartialTaint(Sym, LCV->getRegion(), Kind);
		}
		}

		NoQUnsubmitted Not Done Reply Inline Actions I still feel bad about producing API with very tricky pre-conditions. The LCV may have various forms - some may have empty store with no symbols at all, and others may be full of direct bindings that make the symbol essentially irrelevant. However, because the taint API is designed to be defensive to cases when taint cannot be added, and it sounds like a good thing, i guess we've taken the right approach here :) I suggest commenting this more thoroughly though, something like: If the SVal represents a structure, try to mass-taint all values within the structure. For now it only works efficiently on lazy compound values that were conjured during a conservative evaluation of a function - either as return values of functions that return structures or arrays by value, or as values of structures or arrays passed into the function by reference, directly or through pointer aliasing. Such lazy compound values are characterized by having exactly one binding in their captured store within their parent region, which is a conjured symbol default-bound to the base region of the parent region. Then inside `if (Sym)`: If the parent region is a base region, we add taint to the whole conjured symbol. Otherwise, when the value represents a record-typed field within the conjured structure, so we add partial taint for that symbol to that field. NoQ: I still feel bad about producing API with very tricky pre-conditions. The LCV may have various…
		vlad.tsyrklevichAuthorUnsubmitted Not Done Reply Inline Actions The pre-conditions for using the API are actually a bit simpler than what's exposed here. I made it explicit to make the logic for tainting LCVs explicit, but the following simpler logic works: if (auto LCV = V.getAs<nonloc::LazyCompoundVal>()) { if (Optional<SVal> binding = getStateManager().StoreMgr->getDefaultBinding(LCV)) { if (SymbolRef Sym = binding->getAsSymbol()) { return addPartialTaint(Sym, LCV->getRegion(), Kind); } } } This works because `addPartialTaint()` actually already performs the 'getRegion() == getRegion()->getBaseRegion()` check already and taints the parent symbol if the region is over the base region already. I chose to replicate it here to make the logic more explicit, but now that I've written this down the overhead of duplicating the logic seems unnecessary. Do you agree? vlad.tsyrklevich:* The pre-conditions for using the API are actually a bit simpler than what's exposed here. I…
		NoQUnsubmitted Not Done Reply Inline Actions The pre-conditions for using the API are actually a bit simpler than what's exposed here. I'm talking about the situation when we add the partial taint to the default-bound symbol but it has no effect because there's a direct binding in the lazy compound value on top of it. The user should ideally understand why it doesn't work that way. I chose to replicate it here to make the logic more explicit, but now that I've written this down the overhead of duplicating the logic seems unnecessary. Do you agree? The variant without the check looks easier to read, and it is kind of obvious that full taint is a special case of partial taint, so i'm for removing the check. NoQ: > The pre-conditions for using the API are actually a bit simpler than what's exposed here.
		vlad.tsyrklevichAuthorUnsubmitted Not Done Reply Inline Actions when we add the partial taint to the default-bound symbol but it has no effect because there's a direct binding in the lazy compound value on top of it Ah, so you're talking about the case where the LCV encompasses a value with both direct & default bindings, e.g. `int foo[1024]; foo[123] = rand(); taint(foo);`? In that case we will miss tainting the `rand` SymbolConjured. I suppose we could scan the region store for matching entries? In practice I think we're really only interested in tainting default bindings anyway for some unknown network/user input anyway. Anyways, your comment makes more sense now. I've added it. vlad.tsyrklevich: > when we add the partial taint to the default-bound symbol but it has no effect because…
// Cannot add taint, so just return the state.		const MemRegion *R = V.getAsRegion();
return this;		return addTaint(R, Kind);
}		}

ProgramStateRef ProgramState::addTaint(const MemRegion *R,		ProgramStateRef ProgramState::addTaint(const MemRegion *R,
TaintTagType Kind) const {		TaintTagType Kind) const {
if (const SymbolicRegion *SR = dyn_cast_or_null<SymbolicRegion>(R))		if (const SymbolicRegion *SR = dyn_cast_or_null<SymbolicRegion>(R))
return addTaint(SR->getSymbol(), Kind);		return addTaint(SR->getSymbol(), Kind);
return this;		return this;
}		}

ProgramStateRef ProgramState::addTaint(SymbolRef Sym,		ProgramStateRef ProgramState::addTaint(SymbolRef Sym,
TaintTagType Kind) const {		TaintTagType Kind) const {
// If this is a symbol cast, remove the cast before adding the taint. Taint		// If this is a symbol cast, remove the cast before adding the taint. Taint
// is cast agnostic.		// is cast agnostic.
while (const SymbolCast *SC = dyn_cast<SymbolCast>(Sym))		while (const SymbolCast *SC = dyn_cast<SymbolCast>(Sym))
Sym = SC->getOperand();		Sym = SC->getOperand();

ProgramStateRef NewState = set<TaintMap>(Sym, Kind);		ProgramStateRef NewState = set<TaintMap>(Sym, Kind);
assert(NewState);		assert(NewState);
return NewState;		return NewState;
}		}

		NoQUnsubmitted Done Reply Inline Actions The `SymRegions` name is a bit confusing because we often shorten `SymbolicRegion` to `SymRegion` (eg. in dumps), which is not what we mean. NoQ: The `SymRegions` name is a bit confusing because we often shorten `SymbolicRegion` to…
		ProgramStateRef ProgramState::addPartialTaint(SymbolRef ParentSym,
		const SubRegion *SubRegion,
		NoQUnsubmitted Done Reply Inline Actions Just see if this pointer is null instead of a separate `contains<>` check? NoQ: Just see if this pointer is null instead of a separate `contains<>` check?
		TaintTagType Kind) const {
		// Ignore partial taint if the entire parent symbol is already tainted.
		if (contains<TaintMap>(ParentSym) && *get<TaintMap>(ParentSym) == Kind)
		return this;
		NoQUnsubmitted Done Reply Inline Actions I wonder if it's worth it to check if a super-region of this region is already tainted, and avoid adding the region in this scenario. I guess in practice it won't happen very often, because this code would most likely be executed just once per taint source. This probably deserves a comment though. NoQ: I wonder if it's worth it to check if a super-region of this region is already tainted, and…
		NoQUnsubmitted Not Done Reply Inline Actions Speaking of taint tags: right now we didn't add support for multiple taint tags per symbol (because we have only one taint tag to choose from), but `addTaint` overwrites the tag. I guess we should do the same for now. NoQ: Speaking of taint tags: right now we didn't add support for multiple taint tags per symbol…
		vlad.tsyrklevichAuthorUnsubmitted Not Done Reply Inline Actions I believe this is the current behavior. On line 714 I presume ImmutableMap::add overrides a key if it's already present in the map but I couldn't trace down the Tree ADT implementation to confirm this. vlad.tsyrklevich: I believe this is the current behavior. On line 714 I presume ImmutableMap::add overrides a key…
		NoQUnsubmitted Not Done Reply Inline Actions I presume ImmutableMap::add overrides a key if it's already present in the map Yep, it does. I believe this is the current behavior. No, you early-return the original state if the full-taint map already contains the info for the whole symbol on line 703. Hmm. In fact, with my suggestion we'd be able to have full taint of one kind and partial taint of another kind. I guess it's all right. NoQ: > I presume ImmutableMap::add overrides a key if it's already present in the map Yep, it does.

		// Partial taint applies if only a portion of the symbol is tainted.
		if (SubRegion == SubRegion->getBaseRegion())
		return addTaint(ParentSym, Kind);

		TaintedSubRegionsRef TaintedSubRegions(0, TSRFactory.getTreeFactory());
		if (const TaintedSubRegionsRef *SavedTaintedRegions =
		get<DerivedSymTaint>(ParentSym))
		TaintedSubRegions = *SavedTaintedRegions;

		TaintedSubRegions = TaintedSubRegions.add(SubRegion, Kind);
		ProgramStateRef NewState = set<DerivedSymTaint>(ParentSym, TaintedSubRegions);
		NoQUnsubmitted Done Reply Inline Actions Can we assert that the returned state is not empty, like in `addTaint`? NoQ: Can we assert that the returned state is not empty, like in `addTaint`?
		assert(NewState);
		return NewState;
		}

bool ProgramState::isTainted(const Stmt S, const LocationContext LCtx,		bool ProgramState::isTainted(const Stmt S, const LocationContext LCtx,
TaintTagType Kind) const {		TaintTagType Kind) const {
if (const Expr *E = dyn_cast_or_null<Expr>(S))		if (const Expr *E = dyn_cast_or_null<Expr>(S))
S = E->IgnoreParens();		S = E->IgnoreParens();

SVal val = getSVal(S, LCtx);		SVal val = getSVal(S, LCtx);
return isTainted(val, Kind);		return isTainted(val, Kind);
}		}
Show All 24 Lines	bool ProgramState::isTainted(const MemRegion *Reg, TaintTagType K) const {
return false;		return false;
}		}

bool ProgramState::isTainted(SymbolRef Sym, TaintTagType Kind) const {		bool ProgramState::isTainted(SymbolRef Sym, TaintTagType Kind) const {
if (!Sym)		if (!Sym)
return false;		return false;

// Traverse all the symbols this symbol depends on to see if any are tainted.		// Traverse all the symbols this symbol depends on to see if any are tainted.
bool Tainted = false;
for (SymExpr::symbol_iterator SI = Sym->symbol_begin(), SE =Sym->symbol_end();		for (SymExpr::symbol_iterator SI = Sym->symbol_begin(), SE =Sym->symbol_end();
SI != SE; ++SI) {		SI != SE; ++SI) {
if (!isa<SymbolData>(*SI))		if (!isa<SymbolData>(*SI))
continue;		continue;

const TaintTagType Tag = get<TaintMap>(SI);		if (const TaintTagType Tag = get<TaintMap>(SI)) {
Tainted = (Tag && *Tag == Kind);		if (*Tag == Kind)
		return true;
		}

		if (const SymbolDerived SD = dyn_cast<SymbolDerived>(SI)) {
// If this is a SymbolDerived with a tainted parent, it's also tainted.		// If this is a SymbolDerived with a tainted parent, it's also tainted.
if (const SymbolDerived SD = dyn_cast<SymbolDerived>(SI))		if (isTainted(SD->getParentSymbol(), Kind))
Tainted = Tainted \|\| isTainted(SD->getParentSymbol(), Kind);		return true;

// If memory region is tainted, data is also tainted.		// If this is a SymbolDerived with the same parent symbol as another
if (const SymbolRegionValue SRV = dyn_cast<SymbolRegionValue>(SI))		// tainted SymbolDerived and a region that's a sub-region of that tainted
Tainted = Tainted \|\| isTainted(SRV->getRegion(), Kind);		// symbol, it's also tainted.
		if (const TaintedSubRegionsRef *SymRegions =
		get<DerivedSymTaint>(SD->getParentSymbol())) {
		const TypedValueRegion *R = SD->getRegion();
		for (TaintedSubRegionsRef::iterator I = SymRegions->begin(),
		E = SymRegions->end();
		I != E; ++I) {
		// FIXME: The logic to identify tainted regions could be more
		// complete. For example, this would not currently identify
		// overlapping fields in a union as tainted. To identify this we can
		// check for overlapping/nested byte offsets.
		if (Kind == I->second &&
		(R == I->first \|\| R->isSubRegionOf(I->first)))
		return true;
		}
		}
		}

// If If this is a SymbolCast from a tainted value, it's also tainted.		// If memory region is tainted, data is also tainted.
if (const SymbolCast SC = dyn_cast<SymbolCast>(SI))		if (const SymbolRegionValue SRV = dyn_cast<SymbolRegionValue>(SI)) {
Tainted = Tainted \|\| isTainted(SC->getOperand(), Kind);		if (isTainted(SRV->getRegion(), Kind))
		return true;
		}

		NoQUnsubmitted Done Reply Inline Actions Just see if this pointer is null instead of a separate `contains<>` check? NoQ: Just see if this pointer is null instead of a separate `contains<>` check?
if (Tainted)		// If this is a SymbolCast from a tainted value, it's also tainted.
		if (const SymbolCast SC = dyn_cast<SymbolCast>(SI)) {
		if (isTainted(SC->getOperand(), Kind))
return true;		return true;
}		}
		NoQUnsubmitted Done Reply Inline Actions This could be made even stronger when there are multiple ways of constructing the same sub-region. For instance, union { int x; char y[4]; } u; u.x = taint(); u.y[0]; // is tainted? To handle such cases, we could try to see if byte offsets are nested, instead of checking `isSubRegionOf()`. I suggest adding a TODO (and maybe a FIXME-test), because it gets more and more complicated. Especially with symbolic offsets. NoQ: This could be made even stronger when there are multiple ways of constructing the same sub…
		}

return Tainted;		return false;
}		}

		NoQUnsubmitted Not Done Reply Inline Actions I actually have no idea why does this function accumulate things in the `Tainted` variable, instead of returning :) NoQ: I actually have no idea why does this function accumulate things in the `Tainted` variable…
		vlad.tsyrklevichAuthorUnsubmitted Not Done Reply Inline Actions It made a little more stylistic sense before this change, but not so much now. I've updated them to return immediately. vlad.tsyrklevich: It made a little more stylistic sense before this change, but not so much now. I've updated…

lib/StaticAnalyzer/Core/RegionStore.cpp

Show First 20 Lines • Show All 490 Lines • ▼ Show 20 Lines	public: // Part of public interface to class.
/// else		/// else
/// return symbolic		/// return symbolic
SVal getBinding(Store S, Loc L, QualType T) override {		SVal getBinding(Store S, Loc L, QualType T) override {
return getBinding(getRegionBindings(S), L, T);		return getBinding(getRegionBindings(S), L, T);
}		}

Optional<SVal> getDefaultBinding(Store S, const MemRegion *R) override {		Optional<SVal> getDefaultBinding(Store S, const MemRegion *R) override {
RegionBindingsRef B = getRegionBindings(S);		RegionBindingsRef B = getRegionBindings(S);
return B.getDefaultBinding(R);		// Default bindings are always applied over a base region so look up the
		// base region's default binding, otherwise the lookup will fail when R
		// is at an offset from R->getBaseRegion().
		return B.getDefaultBinding(R->getBaseRegion());
}		}

SVal getBinding(RegionBindingsConstRef B, Loc L, QualType T = QualType());		SVal getBinding(RegionBindingsConstRef B, Loc L, QualType T = QualType());

SVal getBindingForElement(RegionBindingsConstRef B, const ElementRegion *R);		SVal getBindingForElement(RegionBindingsConstRef B, const ElementRegion *R);

SVal getBindingForField(RegionBindingsConstRef B, const FieldRegion *R);		SVal getBindingForField(RegionBindingsConstRef B, const FieldRegion *R);

▲ Show 20 Lines • Show All 1,972 Lines • Show Last 20 Lines

test/Analysis/taint-generic.c

Show First 20 Lines • Show All 186 Lines • ▼ Show 20 Lines	void testStruct() {

sock = socket(AF_INET, SOCK_STREAM, 0);		sock = socket(AF_INET, SOCK_STREAM, 0);
read(sock, &tainted, sizeof(tainted));		read(sock, &tainted, sizeof(tainted));
__builtin_memcpy(buffer, tainted.buf, tainted.length); // expected-warning {{Untrusted data is used to specify the buffer size}}		__builtin_memcpy(buffer, tainted.buf, tainted.length); // expected-warning {{Untrusted data is used to specify the buffer size}}
}		}

void testStructArray() {		void testStructArray() {
struct {		struct {
char buf[16];
struct {
int length;		int length;
} st[1];		} tainted[4];
} tainted;

char buffer[16];		char dstbuf[16], srcbuf[16];
int sock;		int sock;

sock = socket(AF_INET, SOCK_STREAM, 0);		sock = socket(AF_INET, SOCK_STREAM, 0);
read(sock, &tainted.buf[0], sizeof(tainted.buf));		__builtin_memset(srcbuf, 0, sizeof(srcbuf));
read(sock, &tainted.st[0], sizeof(tainted.st));
// FIXME: tainted.st[0].length should be marked tainted		read(sock, &tainted[0], sizeof(tainted));
__builtin_memcpy(buffer, tainted.buf, tainted.st[0].length); // no-warning		__builtin_memcpy(dstbuf, srcbuf, tainted[0].length); // expected-warning {{Untrusted data is used to specify the buffer size}}
		NoQUnsubmitted Done Reply Inline Actions Are we already supporting the case when we're tainting some elements of an array but not all of them, and this works as expected? Could we add such tests (regardless of whether we already handle them)? NoQ: Are we already supporting the case when we're tainting some elements of an array but not all of…
		vlad.tsyrklevichAuthorUnsubmitted Not Done Reply Inline Actions It does work in that case. If you taint element X of region Y the current logic will be conservative and only mark element X as tainted, not X-i or X+i. This is also true for element 0, so if a programmer passes &array[0] but reads sizeof(array) bytes it will not correctly mark that. This is also a short coming of the invalidation code so I don't think there's much to do until there's more general support for dealing with region extents. vlad.tsyrklevich: It does work in that case. If you taint element X of region Y the current logic will be…
		NoQUnsubmitted Not Done Reply Inline Actions \o/ NoQ: \o/

		__builtin_memset(&tainted, 0, sizeof(tainted));
		read(sock, &tainted, sizeof(tainted));
		__builtin_memcpy(dstbuf, srcbuf, tainted[0].length); // expected-warning {{Untrusted data is used to specify the buffer size}}

		__builtin_memset(&tainted, 0, sizeof(tainted));
		// If we taint element 1, we should not raise an alert on taint for element 0 or element 2
		read(sock, &tainted[1], sizeof(tainted));
		__builtin_memcpy(dstbuf, srcbuf, tainted[0].length); // no-warning
		__builtin_memcpy(dstbuf, srcbuf, tainted[2].length); // no-warning
		}

		void testUnion() {
		union {
		int x;
		char y[4];
		} tainted;

		char buffer[4];

		int sock = socket(AF_INET, SOCK_STREAM, 0);
		read(sock, &tainted.y, sizeof(tainted.y));
		// FIXME: overlapping regions aren't detected by isTainted yet
		__builtin_memcpy(buffer, tainted.y, tainted.x);
}		}

int testDivByZero() {		int testDivByZero() {
int x;		int x;
scanf("%d", &x);		scanf("%d", &x);
return 5/x; // expected-warning {{Division by a tainted value, possibly zero}}		return 5/x; // expected-warning {{Division by a tainted value, possibly zero}}
}		}

▲ Show 20 Lines • Show All 46 Lines • Show Last 20 Lines