This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] alpha.security.DirtyScalar Checker
AbandonedPublic

Authored by gerazo on Dec 14 2016, 5:01 AM.

Details

Summary

Checker for catching tainted value usage without proper bound checking. Uses GenericTaintChecker which is also in alpha.security.

Diff Detail

Event Timeline

gerazo updated this revision to Diff 81370.Dec 14 2016, 5:01 AM
gerazo retitled this revision from to [analyzer] alpha.security.DirtyScalar Checker.
gerazo updated this object.
gerazo added reviewers: zaks.anna, dcoughlin.
gerazo added a subscriber: cfe-commits.
NoQ added a subscriber: NoQ.Dec 14 2016, 11:05 AM

Thank you for your work, Zoltán!
Did you checked if same warnings may be emitted by another checkers? For example, ArrayBoundChecker may warn if index is tainted.
I also have some comments below.

lib/StaticAnalyzer/Checkers/DirtyScalarChecker.cpp
47

CheckerContext &C to unify naming.

51

The default choice of such a value needs some explanation. It is also good to move a hard-coded value to a named constant, or, maybe a separate checker option.

52

ProgramStateRef State, const Stmt *S).
These names are usually used in analyzer and LLVM for ProgramState and Stmt, correspondingly.

54

We pass V by non-const reference, but it is not changed inside the function. So, we may use a constant reference here or even pass it by value (because it is small enough).

62

CallEvent::getOriginExpr() may return nullptr (in case of implicit destructor calls, for example) so we need to check the result before dyn_cast or dereference.

82
  1. Variable names should start with capital letter to follow LLVM naming convention.
  2. We may move CE->getNumArgs out of the loop in order not to re-evaluate it every time so the code will look like

for (unsigned int I = 0, E = CE->getNumArgs(); I < E; ++I) {.

We also can C++11-fy this loop:

for (const Expr *Arg : CE->arguments())
  checkUnbounded(C, State, Arg);
test/Analysis/dirty-scalar-unbounded.cpp
1
  1. It will be good to have tests for option set to true.
  2. Is there any test that makes usage of 'RecurseLimit` variable?
zaks.anna edited edge metadata.Dec 15 2016, 10:18 AM

Did you checked if same warnings may be emitted by another checkers? For example,
ArrayBoundChecker may warn if index is tainted.

I second that. The GenericTaintChecker also reports uses of tainted values. It is not clear that we should add a new separate checker instead of adding the missing cases to the existing checkers.

Thank you!
Anna.

gerazo updated this revision to Diff 82336.Dec 22 2016, 5:42 AM
gerazo edited edge metadata.
gerazo marked 6 inline comments as done.Dec 22 2016, 5:45 AM

Thank you very much for your help. I've added all suggested modifications including tests covering all checker option settings.

gerazo marked an inline comment as done.Dec 22 2016, 6:21 AM

So thank you again for the valuable questions.
In this checker, I give warnings for values which are both tainted and were also not checked by the programmer. So unlike GenericTaintChecker, I do implement the boundedness check here for certain, interesting constructs (which is controlled by the critical option). GenericTaintChecker focuses purely on taintedness, almost like a service for other checkers. I've added a new rule to it, improving the taintedness logic, but I felt mixing the bound checking logic into it would make the two ideas inseparable.

I've also checked others using tainted values. ArrayBoundCheckerV2 also works with array indexing and modifies its behaviour on finding tainted values. However, the main difference is that ArrayBoundCheckerV2 uses region information to check bounds which may or may not be present, while this checker can give a warning whenever any information from the constraint manager does not prove that any bound checking were performed on the value before (and potentially works on many other constructs where region information shouldn't be there at all). Not having correct region information with tainted values is typical when reading data from unknown sources. This dirty approach is better in this regard. ArrayBoundCheckerV2 on the other hand can give warnings solely based on region information. Yes, it can happen that both will give a warning as well for the same construct. Do you think it is distracting? Should I remove the array indexing checks from this checker (still it gives warning for pointer arithmetic as well)?

test/Analysis/dirty-scalar-unbounded.cpp
1

Now both settings are covered. For RecurseLimit, I've added a named constant and better explanation. This is only a practical limit to not let the analysis dive too deep into complex loop expressions. The limit currently set should be adequate, so it would not make sense to set it programmatically.

gerazo marked an inline comment as done.Dec 22 2016, 6:23 AM

Hi, did you have time to check my changes?

Hello Zoltan. Sorry, I'm a bit busy now. Here are my thoughts about the design.

  1. I think we should not add new warnings to GenericTaintChecker. Instead, it is better to move warnings out of it. After that it will become just a plugin used by other checkers. Such split will make it cleaner because we avoid mixing taint propagating logic and checks. It is not an item for this patch, of course. This new checker follows my preferences in this part.
  2. For me, taint-related check for array index in this checker and in ArrayBoundCheckerV2 look almost the same if the offset is known (with just a single difference). However, in case of tainted check we can be less conservative so approach used in this checker when a warning is emitted even if base region or offset are unknown looks better for me. I think we should remove tainted check from ArrayBoundV2 and leave it in this new checker. But this causes a situation where out-of-bound check is not done in checker intended for this. A possible solution here is to move array-related logic of DirtyScalar to a separate checker that is enabled when ArrayBound or DirtyScalar are enabled. Zoltán, could you confirm that your checker emits same warnings on ArrayBoundChecker test cases with taint?

Anna, what's your opinion? Did I miss something?

lib/StaticAnalyzer/Checkers/DirtyScalarChecker.cpp
185

Does the second check means that we exclude boolean and char values? I cannot find any reason to do it for chars.

In this checker, I give warnings for values which are both tainted and were also not checked by the programmer. So unlike GenericTaintChecker, I do implement the boundedness check here for certain, interesting constructs (which is controlled by the critical option). GenericTaintChecker focuses purely on taintedness, almost like a service for other checkers. I've added a new rule to it, improving the taintedness logic, but I felt mixing the bound checking logic into it would make the two ideas inseparable.

I'd like to step back a bit. In my view, the taint implementation should consist of three elements:

  1. taint source
  2. taint sink
  3. cleansing rules

I always considered the taint analysis in the analyzer not fully implemented because #3 was missing. It sounds a lot like non-"dirty" scalars would be the same as values that went through cleansing - they should be considered not tainted anymore. Now, are cleansing rules checker specific or generic? If they are generic, these rules should definitely be part of GenericTaintChecker and every checker using taint would utilize them.

WDYT?

(We can talk about the array bound checking part separately.)

gerazo marked an inline comment as done.Feb 28 2017, 5:50 AM
gerazo added inline comments.
lib/StaticAnalyzer/Checkers/DirtyScalarChecker.cpp
185

Yes, we exclude them.
Using lookup tables especially in cryptography sometimes involve reading a value from disk and than using this value immediately with a table lookup. This way, you use a dirty value directly in array indexing. Reading a byte and using it on a prepared 256 element table is common. As the read value gets bigger it is less performant and hence less common to do it.
I've found exactly 1 false positive in openssl without this exclusion.

gerazo marked an inline comment as done.Mar 3 2017, 7:38 AM

Hmm... I am thinking on this issue for a week now...

I've played with the idea of implementing cleansing rules in GenericTaintChecker. It would be elegant but unfortunately, I have to think they are not general. Cleansing of a string (because it has no terminating null at the end) is very different from integral type cleansing. A signed value has to be lower bound checked as well, an unsigned only gets upper bound treatment. It also turns out that the type itself also can't give enough information about the needed cleansing rule. A number used for array indexing can be properly bound checked on the region extents while a simple number can only be checked in a way that any upper bound checking was done at all on it... All this leads to the need of several types of taintednesses (string taintedness, array taintedness, general bound check taintedness) because the cleansing can only take down one type of taintedness at a time. That would be the only way for a checker to be able to access that the taintedness type specific to the checker is still there or was already cleansed by the central logic. For me it shows that cleansing rules belong to specific checkers and cannot be efficiently generalized even in case of a single int type.

About the ArrayBoundCheckerV2: I agree that no redundant checks should be done system-wide. I would also extend ArrayBoundCheckV2 or put array checking into a separate checker. Currently that checker and the one implemented here do not give the same results. ArrayBoundCheckerV2 knows more about the array but is not willing to give a warning without knowing region information for the array. The easiest way would be to use one checker's code from the other and find out if the other is active and would already give a warning anyway... but I understand that it is against current architectural policies.

All this leads to the need of several types of taintednesses (string taintedness, array taintedness, general bound check taintedness) because the cleansing can only take down one type of taintedness at a time. That would be the only way for a checker to be able to access that the taintedness type specific to the checker is still there or was already cleansed by the central logic. For me it shows that cleansing rules belong to specific checkers and cannot be efficiently generalized even in case of a single int type.

I do not see why we cannot have type-dependent rules in the general taint checker. Checker-specific taint rules could be added to the checkers on top of the generic rules. Though, I do not have an example of this off the top of my head. Specifically, I do not think we can issue a warning every time we are not 100% sure that an index into an array is not in bounds with respect to the region extents. It would trigger for cases when the code compares against a statically unknown value and lead to too many false positives.

Stepping back a bit, what do you consider "dirty" vs "clean"? It seems that you are looking for prove that the values are known to be within the bounds of min and max int values. What happens if there is a comparison to an unknown symbolic value? Should that be considered as clean or tainted? Are there test cases for this?

Stepping back a bit, what do you consider "dirty" vs "clean"? It seems that you are looking for prove that the values are known to be within the bounds of min and max int values. What happens if there is a comparison to an unknown symbolic value? Should that be considered as clean or tainted? Are there test cases for this?

I consider values as clean when they were checked by the programmer from both sides. However, my implementation purely works from constraints in effect (and using min and max is just the broadest constraint I could find). So you are totally right that comparison with unknown symbols is not covered nor in implementation, nor in tests. Can you suggest a universally working method which can handle all cases (e.g. complex expressions on both sides of the operator)? If we could find such an approach, that would be something which could really go into the GenericTaintChecker as an improvement. And I would gladly rewrite this whole stuff to fit the more general solution.

Before abandoning this patch and rewriting it, I would like to get a thumbs up for my plans: I will reimplement all functionality included here but without creating a new checker. Some parts which relate to specific checkers will be put into the corresponding checkers (like ArrayBoundCheckerV2). General ideas on taintedness (cleansing rules and usage warnings on standard types) will be put into GenericTaintChecker. We will see how it goes, will we have a smaller patch or not. WDYT?

dkrupp added a subscriber: dkrupp.Jun 14 2017, 7:03 AM
zaks.anna edited edge metadata.Jun 14 2017, 11:18 AM

This generally sounds good. Definitely do submit these changes in small pieces! See http://llvm.org/docs/DeveloperPolicy.html#incremental-development for rationale.

zaks.anna resigned from this revision.Aug 27 2017, 1:24 AM
gerazo abandoned this revision.Aug 28 2017, 2:31 AM