This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add custom filter functions for GenericTaintChecker
ClosedPublic

Authored by boga95 on Mar 18 2019, 3:25 PM.

Details

Summary

One can pass a configuration file to the checker with the following argument: -analyzer-config alpha.security.taint.TaintPropagation:Config=/path/to/the/file/taint-generic-config.yaml. The config file can contain:

  • Propagations: One can define functions which propagate or create the taintedness. It has five fields:
    • Name: The name of the function. Mandatory field.
    • SrcArgs: A list of arguments. If any of them tainted, the destination arguments will be marked tainted. It's not defined, the destination arguments always will be marked as tainted.
    • DstArgs: A list of arguments. Set the tainted flag for the arguments, if they are marked. The return value's index is 4294967294(it is temporary).
    • VarType: It's an enum with three possible values: None, Src, Dst. The default value is None and do nothing.
    • VarIndex: It's the first variadic argument for the function. If VarType == Src and any of them is tainted, the destination arguments will be marked ad tainted. If VarType == Dst and they are marked, all argument from the VarIndex will be marked as tainted.
  • Filters: One can define function remove the tainted flag if it is passed to the proper argument.
    • Name: The name of the function. Mandatory field.
    • Args: A list of arguments. If a tainted value is passed to it, the tainted flag will be removed. Mandatory field.
  • Sinks: A list of function which will give a warning if it gets a tainted value.
    • Name: The name of the function. Mandatory field.
    • Args: A list of arguments. If any of those arguments get a tainted value, it will give a warning. Mandatory field.

For the propagations, it uses the config to deduce the TaintPropagationRules from the function's name.
The filter functions are understandable as functions which mark their arguments not tainted. I improved the information flow from pre-visit to post-visit, therefore, the TaintTagType could be passed to the setTaint function. Currently, it only works if the argument is a pointer.

Diff Detail

Repository
rC Clang

Event Timeline

boga95 created this revision.Mar 18 2019, 3:25 PM
Szelethus requested changes to this revision.Mar 19 2019, 4:29 AM

I'm very much guilty of doing functional and refactoring changes within the same patch, but I think working on GenericTaintChecker AND in the same patch doing (seemingly unrelated) function name changes in ProgramState might be overkill -- Could you please divide this patch into smaller parts please?

This revision now requires changes to proceed.Mar 19 2019, 4:29 AM
boga95 updated this revision to Diff 191668.Mar 21 2019, 5:56 AM
boga95 retitled this revision from [analyzer] Make GenericTaintChecker configurable to [analyzer] Add custom filter functions for GenericTaintChecker.
Szelethus requested changes to this revision.Mar 25 2019, 6:10 AM

Same thing.

This revision now requires changes to proceed.Mar 25 2019, 6:10 AM

I add a new taint type, which represents a lack of taintedness. That's why I changed the name of addTaint() to setTaint(). Of course, it's not an important change, I can move it to another patch.

NoQ added a comment.Mar 26 2019, 6:47 PM

Hi, i wanted to squeeze in D59861 somewhere in the middle of your work, would you mind?
I'll definitely have a look at your patches soon :)

boga95 updated this revision to Diff 193705.Apr 4 2019, 7:13 AM

Rebase after https://reviews.llvm.org/D59861.
Fix custom filter test case: functions without definition always remove taintedness.

In general, the patch is looking alright, I'll take a second look later on. Don't mind my inlines too much, they are more directed towards the original code then your changes.

lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
50–53

Interesting, isn't this predefined for std::pair?

468–470

When do these happen? For implicit functions in C?

472–474

And this? Lambdas maybe?

548–550

Please add the full type name, else it isn't obvious why we don't take it as a const reference.

549–550

We don't use std::pair, so we have descriptive field names, do we need these?

lib/StaticAnalyzer/Checkers/Taint.h
25–28 ↗(On Diff #193705)

Is there a very good reason behind us not using an enum instead?

boga95 updated this revision to Diff 219981.Sep 12 2019, 12:55 PM
boga95 marked 4 inline comments as done.
NoQ added a comment.Sep 12 2019, 1:23 PM

I'd like the test cases to actually demonstrate the correct use of the filters and the correct behavior of the Analyzer when the filters are annotated correctly, but it looks to me that they either demonstrate behavior when the annotation is not used correctly, or we disagree about how the taint should work in the first place. Testing the behavior when the annotation is misplaced is fine (with enough comments and probably FIXMEs where applicable), but "positive" tests are more valuable because they are the actual common cases (hopefully).

clang/test/Analysis/taint-generic.c
381–385 ↗(On Diff #219981)

myFilter1 will create a new conjured symbol within x when evaluated conservatively. In this case there clearly shouldn't be a warning on Buffer[x] = 1, because nobody marked the new symbol as tainted to begin with. Therefore i think that this test doesn't really test the new functionality.

387–391 ↗(On Diff #219981)

Assuming myFilter2 is inlined, we have two execution paths here. On one of them x is reset to a concrete 0. Because concrete values cannot carry taint to begin with, this execution path doesn't test the new functionality. On the other path x indeed has the old tainted value but it's now constrained to [0, INT_MAX]. Because it's still not constrained to be less than BUFSIZE, i'd say this looks more like a false negative and the new functionality makes the analysis worse. So this test does test the new functionality, but it kinda makes the new functionality look bad rather than good.

393–397 ↗(On Diff #219981)

In this example myFilter3 promises not to alter the value of x due to const-qualifier on the pointee type of the parameter. Additionally, the function has no chance of preventing the program from reaching the buffer overflow line, other than crashing the program (given that it's C and there are no exceptions). Therefore i'd say this is also a false negative.

A better motivational example that doesn't immediately look like a false negative may look like this:

void testConfigurationFilter3() {
  int x = mySource1();
  if (isOutOfRange(x)) // the filter function
    return;
  Buffer[x] = 1; // no-warning
}

In this case the function looks at the value of x (there's really no need to pass it by pointer) and notifies the caller through its return value that it is unsafe to access the buffer with such index. This is probably the only situation where a "filter" annotation is actually worth it.

NoQ added inline comments.Sep 12 2019, 3:31 PM
clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
517 ↗(On Diff #219981)

Why not simply remove taint?

boga95 updated this revision to Diff 220385.Sep 16 2019, 1:50 PM
boga95 marked 4 inline comments as done.
boga95 marked 7 inline comments as done.Oct 3 2019, 1:48 PM

Ping

clang/test/Analysis/taint-generic.c
393–397 ↗(On Diff #219981)

It could work with C++ despite it hasn't supported any specific feature (yet).

I pass it by const int* because it currently doesn't support the value semantics :(. It has been already in my TODO list.

lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
50–53

Unfortunately, it's not.

468–470

For example, a lambda doesn't have an FunctionDecl.

472–474

I haven't found any example of this. It's just a copy-paste refactoring so I don't know the intentions behind that.

I think the checker should work perfectly without it.

lib/StaticAnalyzer/Checkers/Taint.h
25–28 ↗(On Diff #193705)

Unfortunately, enums cannot put into FoldingSet. It has to be a primitive type or a struct with the necessary functions. I can use enums and cast them to unsigned and back, but it isn't convenient.

boga95 marked 2 inline comments as done.Oct 9 2019, 2:37 PM

Ping

I think this patch is ok.

Although there are remarks:

  • I think the current implementation of the taint filtering functions does not follow the expected semantics. Now the modelling would remove taint before calling the function (pre statement). One might expect that the modelling would remove taint only on function return (post statement). This way we would not catch this:
int myFilter(int* p) { // this function is stated as a filter function in the config file
  int lookupTable[32] = {};
  int value = lookupTable[*p]; // using a tainted value for indexing
  escape(p);
  return value;
}
  • I agree with @NoQ about the testConfigurationFilter, which now does not test the implementation but the behavior of the static analyzer if it conservatively invalidates in case of an unknown function call.
  • I also think that we should have testcases for testing the filtering functionality of the config file. Eg. using the myFilter1 could do the job here in the tests.

All in all, I still say that it seems to be a good patch.

clang/test/Analysis/taint-generic.c
59 ↗(On Diff #220385)

Have you considered using _Bool instead?

NoQ accepted this revision.Nov 7 2019, 4:53 PM

@NoQ: "Why not simply remove taint?"
@boga95: *removes taint*
@NoQ: "Hmm, now that i think about it, adding a 'no taint' marker might actually work correctly more often."

Like, if you have taint on $x and try to remove taint from derived<$x, x.a>, your current implementation will do nothing. But the approach with adding a 'no taint' marker will actually add a new marker and make subsequent lookups to derived<$x, x.a> will return the newly added marker, which is the correct behavior; additionally, derived<$x, x.b> would remain tainted (where b != a), which is also the correct behavior. It would have still failed when you describe the sub-region slightly differently, but that'd be a fairly minor glitch.

The right way to proceed further with the removeTaint() approach on SymbolDerived is to introduce removePartialTaint() that would complement addPartialTaint(). But that will require changing the data structure in the program state from a simple set of tainted sub-regions to a sophisticated tree of sub-regions that are marked up as tainted or not tainted. Which might have as well been a marker.

I still tend to believe that removeTaint() is the right approach, but it's a bit harder to get right and a bit worse if not enough effort is invested into it.

There definitely is, however, a good use for the removeTaint() function in both approaches: namely, our taint checker still lacks checkDeadSymbols :D

clang/lib/StaticAnalyzer/Checkers/Taint.cpp
111–112 ↗(On Diff #220385)

That's not entirely true. Like, if you check (char)x, you still have 24 bytes of x controlled by the attacker. But that's a good false-positive-proof approach.

lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
468–470

Lambda most certainly has a FunctionDecl, which is the declaration of its operator()(), and that's exactly what's going to be in FDecl if a lambda is invoked. However, the getKind() of this FunctionDecl will not be Decl::Function but Decl::CXXMethod, like of any other member function.

So this second check is checking that the function is a simple global function.

I recommend replacing with !isa<CXXMethodDecl>(FDecl), purely for readability, or at least adding a comment.

Szelethus accepted this revision.Nov 8 2019, 5:30 AM

LGTM, provided that the inlines are addressed! Thanks!

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
102–103 ↗(On Diff #220385)

I recall that the current thinking is preferring CallEvent, though leave this as-is for now, because @steakhal might already have it locally.

105 ↗(On Diff #220385)

We might as well explain what the return value here means (and for the other functions too).

This revision is now accepted and ready to land.Nov 8 2019, 5:30 AM
boga95 marked 2 inline comments as done.Nov 16 2019, 6:30 AM

I did the required changes and tried to commit it, but I couldn't. I heard the codebase was migrated to GitHub. Maybe it affected my commit access.

NoQ added a comment.Nov 19 2019, 12:22 PM

You're now supposed to push directly to github.
You might also have missed http://lists.llvm.org/pipermail/cfe-dev/2019-August/063219.html.

This revision was automatically updated to reflect the committed changes.
boga95 marked 2 inline comments as done.