This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
lib/StaticAnalyzer/Checkers/
-
StaticAnalyzer/
-
Checkers/
-
GenericTaintChecker.cpp
-
Taint.h
-
Taint.cpp
-
test/Analysis/
-
Analysis/
-
taint-tester.c

Differential D73536

[analyzer][taint] Remove taint from symbolic expressions if used in comparisons
AbandonedPublic

Authored by steakhal on Jan 28 2020, 2:36 AM.

Download Raw Diff

Details

Reviewers

NoQ
Szelethus

Summary

Remove taint from symbolic expressions if used in comparison expressions.

Problem statement and background:
TaintConfig was introduced by D59555.
In that config file users are able to specify functions (sinks) which are emitting warnings if tainted values are passed to it.
This is great, but we don't have the facilities to suppress those warning.

Consider this example:

int idx;
scanf("%d", &idx);

if (idx < 0 || 42 < idx) { // tainted
  return -1;
}
mySink(idx); // Warning {{Untrusted data is passed to a user-defined sink}}
return idx;

Even though we know at the point of mySink is called we know that idx is properly constrained, mySink will emit warning since idx holds tainted value.

Considered solutions:
Describing value constraints in the taint config file is unfeasible.
We could loosen the rules for evaluating sink functions by checking taint only if the value is not constrained enough, but this would require a heuristic to decide that. I believe that no such heuristic would be satisfying.

Provided solution:
AFAIK the option we have left is to remove taint from certain symbolic expressions when a tainted expression occur in a comparison expression. This could be fine tuned by a heuristic, let's say:
Remove taint if exactly one operand of the comparison is tainted.
Ignore equality comparisons against null pointer constants.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

steakhal created this revision.Jan 28 2020, 2:36 AM

Herald added subscribers: cfe-commits, JDevlieghere. · View Herald TranscriptJan 28 2020, 2:36 AM

steakhal added a subscriber: boga95.Jan 28 2020, 2:36 AM

Describing value constraints in the taint config file is unfeasible.

This is the only correct way to go, because, as you yourself point out, every sink function (or other use of tainted value) does indeed have different constraint requirements. Checking the wrong requirements is a very common source of security issues and we cannot afford destroying our ability to catch them.

Like, checking that the tainted value is non-zero is a good idea before dividing by that value, but it's clearly not sufficient before using the same value as an array index.

What exactly is preventing you from describing value constraints in the config file? Like, i get it that the generic case may get pretty rough (given that constraints may be potentially arbitrary algebraic expressions over function argument values and possibly other values), and i guess you could do a "poor man's" wildcard suppression for some sinks ("the constraint for this sink is so complicated that let's see if it was checked at all and think of it as fine if it was), but we definitely should be able to try harder when it matters.

This revision now requires changes to proceed.Jan 28 2020, 10:04 AM

In D73536#1845031, @NoQ wrote:

Describing value constraints in the taint config file is unfeasible.

This is the only correct way to go, because, as you yourself point out, every sink function (or other use of tainted value) does indeed have different constraint requirements.

Over the last couple months I've been pretty conflicted on config files. While I see that it is the correct solution, I also fear that just like attributes, they require tedious work to set up and maintain. With that said, its been a while since I've evaluated analyses that had taint analysis in the focus, so I have no concrete data on whether its worth trying to reduce their count, though I suspect they wouldn't show the entire picture, as very few checkers utilize taintedness.

What exactly is preventing you from describing value constraints in the config file?

This sounds like moving, or even worse duplicating the same checks both in a tool-specific config file and in the code. I sympathize with this as well:

int idx;
scanf("%d", &idx);

if (idx < 0 || 42 < idx) { // tainted
  return -1;
}
mySink(idx); // Warning {{Untrusted data is passed to a user-defined sink}}
return idx;
Even though we know at the point of mySink is called we know that idx is properly constrained, mySink will emit warning since idx holds tainted value.

This is valid, and I totally see how we can't possibly remove the taint (or in other words, prove to the analyzer that we properly checked the value) before passing it into a sink (as I understand it).

In summary, I think making decision like this is maybe a bit premature before we have some more results. It would be interesting to see what happens on larger projects once more checkers utilize taintedness, and act proactively, because

Checking the wrong requirements is a very common source of security issues and we cannot afford destroying our ability to catch them.

Szelethus retitled this revision from [analyser][taint] Remove taint from symbolic expressions if used in comparisons to [analyzer][taint] Remove taint from symbolic expressions if used in comparisons.Feb 5 2020, 5:39 AM

Herald added subscribers: donat.nagy, mikhail.ramalho, a.sidorin and 3 others. · View Herald TranscriptFeb 5 2020, 5:39 AM

I'm convinced that we shouldn't remove taint from expressions used in comparisons.

With the current configuration files, sink functions are not too useful.
For now, I would delay developing a mechanism describing constraints here, since @martong is working on function summaries in D73897,D73898.
In function summaries we could describe how should a given function react to a tainted parameter. Which would draw sink functions in the taint config file meaningless.

I'm planning to abandon this patch if you don't have any comments.

I think its very good that this conversation came up, and it might just happen that we'll end up removing some taint when we have a better understanding of how this works. For now, I think we can put this aside :)

I think a crucial part of the design is what would we do for the following case:

if (x < y || x > z)
  return;
// Here we might not have ranges for x when y and z were symbolic. 
mySink(x); // requires x to be in [0, 255]

So would we warn for the code above? X is certainly in SOME bounds but we were not smart enough to figure out what. And these symbolic constraints are not recorded in the range based constraint manager.

If we want to avoid potential false positives on the code above we do need to somehow record symbolic constraints somewhere.

I genuinely think that in the following case we should warn, since the user already had a chance to express the range assumption using an assert.

I think that regardless which checker in what condition checks for a given constraint.
If the expression is tainted, we should warn each cases if the constraint cannot be proven.
If that is NOT tainted, we should conservatively assume that the precondition is satisfied.

PS: after checking the exploded graph for the following example, I recognized that the range based constraint solver is not smart enough to prove that x must be in range.
Even if we express the necessary information using asserts.
I'm not so sure about warning for this case, after seeing this :|

int scanf(const char *restrict format, ...);
void clang_analyzer_eval(int);

extern void __assert_fail (__const char *__assertion, __const char *__file,
    unsigned int __line, __const char *__function)
     __attribute__ ((__noreturn__));
#define assert(expr) \
  ((expr)  ? (void)(0)  : __assert_fail (#expr, __FILE__, __LINE__, __func__))


void foo(int y, int z) {
  assert(y <= 10);
  assert(z >= 20);
  int x;
  scanf("%d", &x);
  if (x < y || x > z)
    return;

  // x should be in range [10, 20]
  clang_analyzer_eval(0 <= x && x < 256);

  // we want to warn if x is not proven to be in that range
  // mySink(x); // requires x to be in [0, 255]
}

You cannot always have constant bounds. E.g. a dynamically allocated array size might depend on a variable.

steakhal abandoned this revision.Feb 10 2020, 2:07 AM

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Checkers/

GenericTaintChecker.cpp

76 lines

Taint.h

3 lines

Taint.cpp

5 lines

test/

Analysis/

taint-tester.c

50 lines

Diff 240800

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp

Show All 30 Lines
#include <utility>		#include <utility>

using namespace clang;		using namespace clang;
using namespace ento;		using namespace ento;
using namespace taint;		using namespace taint;

namespace {		namespace {
class GenericTaintChecker		class GenericTaintChecker
: public Checker<check::PostStmt<CallExpr>, check::PreStmt<CallExpr>> {		: public Checker<check::PostStmt<CallExpr>, check::PreStmt<CallExpr>,
		check::PostStmt<BinaryOperator>> {
public:		public:
static void *getTag() {		static void *getTag() {
static int Tag;		static int Tag;
return &Tag;		return &Tag;
}		}

void checkPostStmt(const CallExpr *CE, CheckerContext &C) const;		void checkPostStmt(const CallExpr *CE, CheckerContext &C) const;

void checkPreStmt(const CallExpr *CE, CheckerContext &C) const;		void checkPreStmt(const CallExpr *CE, CheckerContext &C) const;

		/// Heuristic to cleanse taint from symbolic expressions if that is used in
		/// comparison expressions.
		void checkPostStmt(const BinaryOperator *BinOp, CheckerContext &Ctx) const;

void printState(raw_ostream &Out, ProgramStateRef State, const char *NL,		void printState(raw_ostream &Out, ProgramStateRef State, const char *NL,
const char *Sep) const override;		const char *Sep) const override;

using ArgVector = SmallVector<unsigned, 2>;		using ArgVector = SmallVector<unsigned, 2>;
using SignedArgVector = SmallVector<int, 2>;		using SignedArgVector = SmallVector<int, 2>;

enum class VariadicType { None, Src, Dst };		enum class VariadicType { None, Src, Dst };

▲ Show 20 Lines • Show All 451 Lines • ▼ Show 20 Lines	GenericTaintChecker::TaintPropagationRule::getTaintPropagationRule(
if (It != CustomPropagations.end()) {		if (It != CustomPropagations.end()) {
const auto &Value = It->second;		const auto &Value = It->second;
return Value.second;		return Value.second;
}		}

return TaintPropagationRule();		return TaintPropagationRule();
}		}

		static void
		collectAllTaintedSymbolsRecursively(SymbolRef Sym, ProgramStateRef State,
		SmallVector<SymbolRef, 4> &result) {
		switch (Sym->getKind()) {
		case SymExpr::IntSymExprKind: {
		const auto *IntSym = cast<IntSymExpr>(Sym);
		collectAllTaintedSymbolsRecursively(IntSym->getRHS(), State, result);
		break;
		}
		case SymExpr::SymIntExprKind: {
		const auto *SymInt = cast<SymIntExpr>(Sym);
		collectAllTaintedSymbolsRecursively(SymInt->getLHS(), State, result);
		break;
		}
		case SymExpr::SymSymExprKind: {
		const auto *SymSym = cast<SymSymExpr>(Sym);
		collectAllTaintedSymbolsRecursively(SymSym->getLHS(), State, result);
		collectAllTaintedSymbolsRecursively(SymSym->getRHS(), State, result);
		break;
		}
		default:
		if (taint::isTainted(State, Sym))
		result.push_back(Sym);
		}
		}

		/// If a comparison operator has exactly one tainted operand
		/// remove all tainted symbols that the operand depends on.
		/// Ignores (in)equality operator calls checking against NULL.
		void GenericTaintChecker::checkPostStmt(const BinaryOperator *BinOp,
		CheckerContext &Ctx) const {
		// Handle only (<,<=,>,>=,==,!=) operators.
		if (!BinOp->isComparisonOp())
		return;

		SymbolRef SymLHS = Ctx.getSVal(BinOp->getLHS()).getAsSymExpr();
		SymbolRef SymRHS = Ctx.getSVal(BinOp->getRHS()).getAsSymExpr();

		ProgramStateRef State = Ctx.getState();
		const bool TaintedLHS = taint::isTainted(State, SymLHS);
		const bool TaintedRHS = taint::isTainted(State, SymRHS);

		// Do nothing if both operands are tainted.
		if (TaintedLHS && TaintedRHS)
		return;

		// Do nothing if none of the operands are tainted.
		if (!TaintedLHS && !TaintedRHS)
		return;

		// Ignore comparisons (==,!=) of tainted pointers and NULL.
		if (BinOp->isEqualityOp()) {
		const Expr *OtherArgument = TaintedLHS ? BinOp->getRHS() : BinOp->getLHS();
		const bool IsOtherNullExpr = OtherArgument->isNullPointerConstant(
		Ctx.getASTContext(), Expr::NPC_ValueDependentIsNotNull);
		if (IsOtherNullExpr)
		return;
		}

		// Remove taint.
		SmallVector<SymbolRef, 4> TaintedSubsymbols;
		collectAllTaintedSymbolsRecursively((TaintedLHS ? SymLHS : SymRHS), State,
		TaintedSubsymbols);
		for (SymbolRef Sym : TaintedSubsymbols)
		State = taint::removeTaint(State, Sym);

		Ctx.addTransition(State);
		}

void GenericTaintChecker::checkPreStmt(const CallExpr *CE,		void GenericTaintChecker::checkPreStmt(const CallExpr *CE,
CheckerContext &C) const {		CheckerContext &C) const {
Optional<FunctionData> FData = FunctionData::create(CE, C);		Optional<FunctionData> FData = FunctionData::create(CE, C);
if (!FData)		if (!FData)
return;		return;

// Check for taintedness related errors first: system call, uncontrolled		// Check for taintedness related errors first: system call, uncontrolled
// format string, tainted buffer size.		// format string, tainted buffer size.
▲ Show 20 Lines • Show All 423 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Checkers/Taint.h

Show All 39 Lines	LLVM_NODISCARD ProgramStateRef addTaint(ProgramStateRef State, SymbolRef Sym,
TaintTagType Kind = TaintTagGeneric);		TaintTagType Kind = TaintTagGeneric);

/// Create a new state in which the pointer represented by the region		/// Create a new state in which the pointer represented by the region
/// is marked as tainted.		/// is marked as tainted.
LLVM_NODISCARD ProgramStateRef addTaint(ProgramStateRef State,		LLVM_NODISCARD ProgramStateRef addTaint(ProgramStateRef State,
const MemRegion *R,		const MemRegion *R,
TaintTagType Kind = TaintTagGeneric);		TaintTagType Kind = TaintTagGeneric);

		LLVM_NODISCARD ProgramStateRef removeTaint(ProgramStateRef State, const Stmt *S,
		const LocationContext *LCtx);

LLVM_NODISCARD ProgramStateRef removeTaint(ProgramStateRef State, SVal V);		LLVM_NODISCARD ProgramStateRef removeTaint(ProgramStateRef State, SVal V);

LLVM_NODISCARD ProgramStateRef removeTaint(ProgramStateRef State,		LLVM_NODISCARD ProgramStateRef removeTaint(ProgramStateRef State,
const MemRegion *R);		const MemRegion *R);

LLVM_NODISCARD ProgramStateRef removeTaint(ProgramStateRef State,		LLVM_NODISCARD ProgramStateRef removeTaint(ProgramStateRef State,
SymbolRef Sym);		SymbolRef Sym);

▲ Show 20 Lines • Show All 51 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Checkers/Taint.cpp

Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines	ProgramStateRef taint::addTaint(ProgramStateRef State, SymbolRef Sym,
while (const SymbolCast *SC = dyn_cast<SymbolCast>(Sym))		while (const SymbolCast *SC = dyn_cast<SymbolCast>(Sym))
Sym = SC->getOperand();		Sym = SC->getOperand();

ProgramStateRef NewState = State->set<TaintMap>(Sym, Kind);		ProgramStateRef NewState = State->set<TaintMap>(Sym, Kind);
assert(NewState);		assert(NewState);
return NewState;		return NewState;
}		}

		ProgramStateRef taint::removeTaint(ProgramStateRef State, const Stmt *S,
		const LocationContext *LCtx) {
		return taint::removeTaint(State, State->getSVal(S, LCtx));
		}

ProgramStateRef taint::removeTaint(ProgramStateRef State, SVal V) {		ProgramStateRef taint::removeTaint(ProgramStateRef State, SVal V) {
SymbolRef Sym = V.getAsSymbol();		SymbolRef Sym = V.getAsSymbol();
if (Sym)		if (Sym)
return removeTaint(State, Sym);		return removeTaint(State, Sym);

const MemRegion *R = V.getAsRegion();		const MemRegion *R = V.getAsRegion();
return removeTaint(State, R);		return removeTaint(State, R);
}		}
▲ Show 20 Lines • Show All 150 Lines • Show Last 20 Lines

clang/test/Analysis/taint-tester.c

Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
void BitwiseOp(int in, char inn) {		void BitwiseOp(int in, char inn) {
// Taint on bitwise operations, integer to integer cast.		// Taint on bitwise operations, integer to integer cast.
int m;		int m;
int x = 0;		int x = 0;
scanf("%d", &x);		scanf("%d", &x);
int y = (in << (x << in)) * 5;// expected-warning + {{tainted}}		int y = (in << (x << in)) * 5;// expected-warning + {{tainted}}
// The next line tests integer to integer cast.		// The next line tests integer to integer cast.
int z = y & inn; // expected-warning + {{tainted}}		int z = y & inn; // expected-warning + {{tainted}}
if (y == 5) // expected-warning + {{tainted}}		if (y == 5) { // expected-warning + {{tainted}}
m = z \| z;// expected-warning + {{tainted}}		// Since the only tainted symbol y depended on was the value of x, the
else		// check on y in the condition marked the value of x not tainted anymore.
m = inn;		m = z \| z; // no warning
int mm = m; // expected-warning + {{tainted}}		} else {
		m = inn; // no warning
		}
		int mm = m; // no warning
}		}

// Test getenv.		// Test getenv.
char getenv(const char name);		char getenv(const char name);
void getenvTest(char *home) {		void getenvTest(char *home) {
home = getenv("HOME"); // expected-warning + {{tainted}}		home = getenv("HOME"); // expected-warning + {{tainted}}
if (home != 0) { // expected-warning + {{tainted}}		if (home != 0) { // expected-warning + {{tainted}}
char d = home[0]; // expected-warning + {{tainted}}		char d = home[0]; // expected-warning + {{tainted}}
▲ Show 20 Lines • Show All 80 Lines • ▼ Show 20 Lines	void getlineTest(void) {
size_t len = 0;		size_t len = 0;
ssize_t read;		ssize_t read;
while ((read = getline(&line, &len, stdin)) != -1) {		while ((read = getline(&line, &len, stdin)) != -1) {
printf("%s", line); // expected-warning + {{tainted}}		printf("%s", line); // expected-warning + {{tainted}}
}		}
free(line); // expected-warning + {{tainted}}		free(line); // expected-warning + {{tainted}}
}		}

		int conditionRemovesTaintTest() {
		int idx;
		scanf("%d", &idx); // The value of idx become tainted.
		// Relational operators comparing a tainted value to a non-tainted will
		// remove taint.
		if (idx < 0 \|\| 42 < idx) { // expected-warning + {{tainted}}
		int idx2 = idx; // no warning
		return -1;
		}
		// Not tainted now, since appeared in the condition previously.
		return idx; // no warning
		}

		int conditionDoesNotRemoveTaintTest() {
		int idx1, idx2;
		scanf("%d %d", &idx1, &idx2);

		// Bot operands of the comparison are tainted.
		// Taint won't be removed.
		if (idx1 < idx2) { // expected-warning + {{tainted}}
		int tmp = idx1; // expected-warning + {{tainted}}
		return -1;
		}


		int sum = idx1 + idx2; // expected-warning + {{tainted}}

		// Relation operator removes taint from all dependent symbolic expressions.
		if (0 <= sum && sum < 42) { // expected-warning {{tainted}}
		int tmp1 = idx1; // no warning
		int tmp2 = idx2; // no warning
		int tmp3 = sum; // no warning
		}

		return idx1 + idx2 + sum; // no warning
		}

// Test propagation functions - the ones that propagate taint from arguments to		// Test propagation functions - the ones that propagate taint from arguments to
// return value, ptr arguments.		// return value, ptr arguments.

int atoi(const char *nptr);		int atoi(const char *nptr);
long atol(const char *nptr);		long atol(const char *nptr);
long long atoll(const char *nptr);		long long atoll(const char *nptr);

void atoiTest() {		void atoiTest() {
Show All 20 Lines