This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add more sources to Taint analysis
ClosedPublic

Authored by gamesh411 on Feb 21 2022, 1:36 AM.

Details

Summary

Add more functions as taint sources to GenericTaintChecker.

Diff Detail

Event Timeline

gamesh411 created this revision.Feb 21 2022, 1:36 AM
gamesh411 requested review of this revision.Feb 21 2022, 1:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 21 2022, 1:36 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Sorry for the wall-of-nits. Overall, looks great.

clang/docs/analyzer/checkers.rst
2358

typo/dup?
I cannot recognize the getcw() call. Could you please refer to the specification or an instance where it was defined?

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
552–573

If we handle gets, scanf, we should also model the *_s versions as well.

char *gets_s(char *s, rsize_t n);
int scanf_s(const char *restrict format, ...);
int fscanf_s(FILE *restrict stream, const char *restrict format, ...);
int sscanf_s(const char *restrict s, const char *restrict format, ...);
int vscanf_s(const char *restrict format, va_list arg);
int vfscanf_s(FILE *restrict stream, const char *restrict format, va_list arg);
int vsscanf_s(const char *restrict s, const char *restrict format, va_list arg);
557–559

IMO these functions are highly domain-specific.
On errors, they return specific/well-defined values e.g. -1, '?' or ':'.
That being said, the analyzer does not have this knowledge, thus it will model these as conjured symbols.
If these values were modeled as tainted, we would likely introduce the number of false-positives regarding our limited capabilities of modeling the function accurately.

tldr; I'm against these three rules; or alternatively prove that my concerns are not issues on real code bases.

563

We should check readlinkat as well.

566

In what cases can this function introduce taint?

568

On success, getgroups() returns the number of supplementary group IDs. On error, -1 is returned, and errno is set appropriately.

According to this, the return value index should be also tainted.

clang/test/Analysis/taint-generic.c
496

Well, this can never return zero.
What we should do is to do a state-split for the failure case; since the application should definitely handle a failure in this part; thus the split would be justified.

503

Sometimes we taint the fd and propagate based on that, and othertimes, we simply just return taint.
However, I think it still looks like a better tradeoff this way.
I just wanted to highlight this. Maybe a comment on the CallDescription would be beneficial in describing this discrepancy.

524

Well, it can never be zero AFAIK. It might be -1 on error, or the number of bytes stored into buf.
So, I would rather modify the example to 1 / (s + 1).

528

Please avoid spelling the given function in the name of the test directly in a verbatim manner.
If we would use the CDF_MaybeBuiltin matching mode, the get_current_dir_name_is_source would match for the CallDescription {"get_current_dir_name"} due to the way we fuzzy match for builtins.
Prefixing with Test like testGet_current_dir_name would be fine though, only the underscores are handled differently.
This applies to the rest of the test cases as well.

Also great to see that the docs are in-sync to the code, which is a great plus!

steakhal retitled this revision from Add more sources to Taint analysis to [analyzer] Add more sources to Taint analysis.Feb 21 2022, 5:47 AM

The pre-merge checks failed due to an unrelated test case: Driver/hip-link-bundle-archive.hip (on Windows); so I think we are good to go on that part.

gamesh411 updated this revision to Diff 410747.Feb 23 2022, 1:56 AM
gamesh411 marked 9 inline comments as done.
  • s/getcw/getwd
  • add gets_s
  • remove getopt variants
  • add realinkat
  • discuss getnameinfo?
  • rename tests
  • update getnameinfo
  • comment on source/propagator discrepancy
  • update tests where 1 / tainted, and tainted cannot be 0
  • renamed tests
clang/docs/analyzer/checkers.rst
2358

getwd is the right one instead of getcw

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
552–573

I have added gets_s, and will add the _s variants for the others in the other patch that deals with the propagatorsj.

557–559

I agree that the handling of these should be in another checker, I remember some false positives ( mainly uninteresting, typical "just won't fix" errors ) relating to getopt, and a domain-specific checker could be more appropriate here.
Removed them.

563

Added

566

The getnameinfo converts from

struct sockaddr_in {
    sa_family_t    sin_family; /* address family: AF_INET */
    in_port_t      sin_port;   /* port in network byte order */
    struct in_addr sin_addr;   /* internet address */
};

/* Internet address */
struct in_addr {
    uint32_t       s_addr;     /* address in network byte order */
};

to hostname and servername strings.
One could argue that by crafting a specific IP address, that is known to resolve to a specific hostname in the running environment could lead an attacker injecting a chosen (in some circumstances arbitrary) string into the code at the point of this function.

I know this is a bit contrived, and more on the cybersecurity side of things, so I am not sure whether to add this here, or add this in a specific checker, or just leave altogether. Please share your opinion about this.

568

Added

clang/test/Analysis/taint-generic.c
496

Removed

503

Added a comment

528

fixed the test names

Fewer nits this time.
We are converging!

clang/lib/StaticAnalyzer/Checkers/GenericTaintChecker.cpp
566

int readlinkat(int dirfd, const char *pathname, char *buf, size_t bufsiz);

566

Let it be, I don't mind. We will remove it if we find some FPs for this.

clang/test/Analysis/taint-generic.c
374

Please, also rename this test case.

gamesh411 updated this revision to Diff 411763.Feb 28 2022, 2:04 AM

add readlinkat
rename _IO_getc testcase

gamesh411 marked 3 inline comments as done.Feb 28 2022, 2:06 AM

readlinkat fix incoming

gamesh411 updated this revision to Diff 411765.Feb 28 2022, 2:14 AM

fix readlinkat arg index
extend testcase for readlinkat

gamesh411 marked an inline comment as done.Feb 28 2022, 2:14 AM
steakhal accepted this revision.Feb 28 2022, 2:27 AM

Please check the docs if they are still in sync with the content.
I think it looks great overall.

This revision is now accepted and ready to land.Feb 28 2022, 2:27 AM
This revision was landed with ongoing or failed builds.Feb 28 2022, 2:33 AM
This revision was automatically updated to reflect the committed changes.