This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][taint] Remove taint from symbolic expressions if used in comparisons
AbandonedPublic

Authored by steakhal on Jan 28 2020, 2:36 AM.

Details

Reviewers
NoQ
Szelethus
Summary

Remove taint from symbolic expressions if used in comparison expressions.

Problem statement and background:
TaintConfig was introduced by D59555.
In that config file users are able to specify functions (sinks) which are emitting warnings if tainted values are passed to it.
This is great, but we don't have the facilities to suppress those warning.

Consider this example:

int idx;
scanf("%d", &idx);

if (idx < 0 || 42 < idx) { // tainted
  return -1;
}
mySink(idx); // Warning {{Untrusted data is passed to a user-defined sink}}
return idx;

Even though we know at the point of mySink is called we know that idx is properly constrained, mySink will emit warning since idx holds tainted value.

Considered solutions:
Describing value constraints in the taint config file is unfeasible.
We could loosen the rules for evaluating sink functions by checking taint only if the value is not constrained enough, but this would require a heuristic to decide that. I believe that no such heuristic would be satisfying.

Provided solution:
AFAIK the option we have left is to remove taint from certain symbolic expressions when a tainted expression occur in a comparison expression. This could be fine tuned by a heuristic, let's say:
Remove taint if exactly one operand of the comparison is tainted.
Ignore equality comparisons against null pointer constants.

Diff Detail

Event Timeline

steakhal created this revision.Jan 28 2020, 2:36 AM
NoQ requested changes to this revision.Jan 28 2020, 10:04 AM

Describing value constraints in the taint config file is unfeasible.

This is the only correct way to go, because, as you yourself point out, every sink function (or other use of tainted value) does indeed have different constraint requirements. Checking the wrong requirements is a very common source of security issues and we cannot afford destroying our ability to catch them.

Like, checking that the tainted value is non-zero is a good idea before dividing by that value, but it's clearly not sufficient before using the same value as an array index.

What exactly is preventing you from describing value constraints in the config file? Like, i get it that the generic case may get pretty rough (given that constraints may be potentially arbitrary algebraic expressions over function argument values and possibly other values), and i guess you could do a "poor man's" wildcard suppression for some sinks ("the constraint for this sink is so complicated that let's see if it was checked at all and think of it as fine if it was), but we definitely should be able to try harder when it matters.

This revision now requires changes to proceed.Jan 28 2020, 10:04 AM
In D73536#1845031, @NoQ wrote:

Describing value constraints in the taint config file is unfeasible.

This is the only correct way to go, because, as you yourself point out, every sink function (or other use of tainted value) does indeed have different constraint requirements.

Over the last couple months I've been pretty conflicted on config files. While I see that it is the correct solution, I also fear that just like attributes, they require tedious work to set up and maintain. With that said, its been a while since I've evaluated analyses that had taint analysis in the focus, so I have no concrete data on whether its worth trying to reduce their count, though I suspect they wouldn't show the entire picture, as very few checkers utilize taintedness.

What exactly is preventing you from describing value constraints in the config file?

This sounds like moving, or even worse duplicating the same checks both in a tool-specific config file and in the code. I sympathize with this as well:

int idx;
scanf("%d", &idx);

if (idx < 0 || 42 < idx) { // tainted
  return -1;
}
mySink(idx); // Warning {{Untrusted data is passed to a user-defined sink}}
return idx;

Even though we know at the point of mySink is called we know that idx is properly constrained, mySink will emit warning since idx holds tainted value.

This is valid, and I totally see how we can't possibly remove the taint (or in other words, prove to the analyzer that we properly checked the value) before passing it into a sink (as I understand it).

In summary, I think making decision like this is maybe a bit premature before we have some more results. It would be interesting to see what happens on larger projects once more checkers utilize taintedness, and act proactively, because

Checking the wrong requirements is a very common source of security issues and we cannot afford destroying our ability to catch them.

Szelethus retitled this revision from [analyser][taint] Remove taint from symbolic expressions if used in comparisons to [analyzer][taint] Remove taint from symbolic expressions if used in comparisons.Feb 5 2020, 5:39 AM

I'm convinced that we shouldn't remove taint from expressions used in comparisons.

With the current configuration files, sink functions are not too useful.
For now, I would delay developing a mechanism describing constraints here, since @martong is working on function summaries in D73897,D73898.
In function summaries we could describe how should a given function react to a tainted parameter. Which would draw sink functions in the taint config file meaningless.

I'm planning to abandon this patch if you don't have any comments.

I think its very good that this conversation came up, and it might just happen that we'll end up removing some taint when we have a better understanding of how this works. For now, I think we can put this aside :)

I think a crucial part of the design is what would we do for the following case:

if (x < y || x > z)
  return;
// Here we might not have ranges for x when y and z were symbolic. 
mySink(x); // requires x to be in [0, 255]

So would we warn for the code above? X is certainly in SOME bounds but we were not smart enough to figure out what. And these symbolic constraints are not recorded in the range based constraint manager.

If we want to avoid potential false positives on the code above we do need to somehow record symbolic constraints somewhere.

I genuinely think that in the following case we should warn, since the user already had a chance to express the range assumption using an assert.

I think that regardless which checker in what condition checks for a given constraint.
If the expression is tainted, we should warn each cases if the constraint cannot be proven.
If that is NOT tainted, we should conservatively assume that the precondition is satisfied.


PS: after checking the exploded graph for the following example, I recognized that the range based constraint solver is not smart enough to prove that x must be in range.
Even if we express the necessary information using asserts.
I'm not so sure about warning for this case, after seeing this :|

int scanf(const char *restrict format, ...);
void clang_analyzer_eval(int);

extern void __assert_fail (__const char *__assertion, __const char *__file,
    unsigned int __line, __const char *__function)
     __attribute__ ((__noreturn__));
#define assert(expr) \
  ((expr)  ? (void)(0)  : __assert_fail (#expr, __FILE__, __LINE__, __func__))


void foo(int y, int z) {
  assert(y <= 10);
  assert(z >= 20);
  int x;
  scanf("%d", &x);
  if (x < y || x > z)
    return;

  // x should be in range [10, 20]
  clang_analyzer_eval(0 <= x && x < 256);

  // we want to warn if x is not proven to be in that range
  // mySink(x); // requires x to be in [0, 255]
}

You cannot always have constant bounds. E.g. a dynamically allocated array size might depend on a variable.

steakhal abandoned this revision.Feb 10 2020, 2:07 AM