This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add more propagations to Taint analysis
ClosedPublic

Authored by gamesh411 on Feb 22 2022, 4:34 PM.

Details

Summary

Add more functions as taint propators to GenericTaintChecker.

Diff Detail

Event Timeline

gamesh411 created this revision.Feb 22 2022, 4:34 PM
gamesh411 requested review of this revision.Feb 22 2022, 4:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2022, 4:34 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Huh, this was a long one. 😅 🚀

clang/docs/analyzer/checkers.rst
2361

I cannot see the corresponding propagation rule.
That being said, it would be handy to mention that this is for zlib decompression and this should be probably a taint source anyway.

vfscanf occurs two times.

vscanf is not mentioned here; and there are probably a couple others like this.

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
586

This function has exactly 3 arguments.
I'm also puzzled how tainting va_list will work out. That should be modeled separately IMO.
This comment applies to all of the similar va_list accepting functions, such as vscanf, vfscanf, and possibly others.

That being said, I think vscanf is more like scanf, so it should be modeled as a taint source instead of a propagator.

598

I'm on board with marking read operations as props/sources.
Let's look at the declaration: ssize_t readv(int fd, const struct iovec *iov, int iovcnt);
I'm not sure how could the pointee of iov be modified by this call, as its const.
Additionally, I doubt the effectiveness of the rule, since I don't think it would be too likely to have a path leading to a taint sink with an iov pointer. That being said, let it be, but don't expect much from this rule :D

600

It's sad that we don't model recvmsg, but it would be quite difficult to model. I can see why you left that out.

602–603

I'm not sure how could these be used as gadgets, but let it be.

605–606

These should be sorted, isn't it?

607

int fnmatch(const char *pattern, const char *string, int flags);
The fnmatch() function checks whether the string argument matches the pattern argument, which is a shell wildcard pattern (see glob(7)).

From this man entry I would think that we should propagate from the second argument.

619

One could say that if memmem was called with a tainted needle and actually finds something in the haystack, that would mean essentially that the value pointed by the return value has the same content as the needle.
However, I still agree with the current propagation rule, depending only on arg 0.
This might reasoning might deserve a comment though.
strstr also shares this property.

629
638–639

IMO these should propagate from {0,1}, similarly how the strncmp does propagate from its last argument.

clang/test/Analysis/taint-generic.c
491–497

This test case definitely needs to be reworked.

  1. We don't have FDs in this context, unlike the comment suggests.
  2. vscanf should be an unconditional taint source, instead of being a propagator.
511–514

For this type of things I can recommend using the clang_analyzer_isTainted(T) debug.ExprInspection introspection function.
It will do the right thing, without sinking the execution path.

I've seen in some other test cases that you used an extra top-level fn parameter for the same.
That's slightly better than using a global, but I would still recommend the debug function.

518–519

Please use single-line comments.
It makes debugging test files easier in some cases.

527

clang_analyzer_isTainted(*iov)

1024

You should group these all into a single test case.
This way the setup code for the test dominates compared to the actual content.

1046–1047

The comparison function must return an integer less than, equal to, or greater than zero if the first argument is considered to be respectively less than, equal to, or greater than the second.

1058–1061

Just use the cmp_less instead.
You can also fuse the qsort test cases into a single function.

gamesh411 updated this revision to Diff 412057.Mar 1 2022, 4:58 AM
gamesh411 marked 17 inline comments as done.
  • remove vscanf and co.
  • use debug.ExprInspection for test cases
  • fix semantic issues for modeled functions
gamesh411 added inline comments.Mar 1 2022, 4:58 AM
clang/docs/analyzer/checkers.rst
2361

Removed vscanf and vfscanf as modeling their taint is not straightforward and should be done in another checker.
The problem with those is that they do not use variadic arguments, but an abstraction of those implemented by the type va_list, which is used to support invocations, where the number of arguments is determined at runtime.
In addition to modeling individual v-variants we will also have to handle the creation of va_list objects, and va_start function calls and try to reason about which parameters will be tainted.

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
586

removed

598

On second thought, this seems pointless indeed, removed.

605–606

swapped, thx

clang/test/Analysis/taint-generic.c
518–519

removed

527

test case removed

steakhal accepted this revision.Mar 1 2022, 5:15 AM
steakhal added a reviewer: NoQ.

Looks good to me. Check the docs if they are still in sync.
Also, postpone this for two days to let others block this if they have objections.
Other than that, land it.

This revision is now accepted and ready to land.Mar 1 2022, 5:15 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 7 2022, 2:10 AM
This revision was automatically updated to reflect the committed changes.