Page MenuHomePhabricator

[analyzer] Add InvalidPtrChecker
Needs ReviewPublic

Authored by zukatsinadze on Mar 1 2021, 9:05 AM.

Details

Summary

This patch introduces a new checker: alpha.security.cert.env.InvalidPtr

Checker finds usage of invalidated pointers.

Based on the following SEI CERT Rules:
ENV34-C: https://wiki.sei.cmu.edu/confluence/x/8tYxBQ
ENV31-C: https://wiki.sei.cmu.edu/confluence/x/5NUxBQ

Diff Detail

Event Timeline

zukatsinadze created this revision.Mar 1 2021, 9:05 AM
zukatsinadze requested review of this revision.Mar 1 2021, 9:05 AM

Please suggest which package to use for the checker.
CERT rules are ENV, however, it deals with non-ENV functions as well.

Also, I am having a problem with checkDeadSymbols, it is similar to one xazax.hun faced here: http://reviews.llvm.org/D14203 (many many years ago)
envp memory region is marked dead too early in case of aliasing. Please check the snippets, the second one is problematic:

int main(int argc, char **argv, char *envp[]) {
  putenv((char*) "NAME=VALUE"); // envp invalidated
  envp[0]; // gives error
}

int main(int argc, char **argv, char *envp[]) {
  char **e = envp;
  putenv((char*) "NAME=VALUE"); // envp invalidated
  e[0]; // does not give error :(

  // warnOnDeadSymbol reports 'envp' dead here
} 

int main(int argc, char **argv, char *envp[]) {
  char **e = envp;
  putenv((char*) "NAME=VALUE"); // envp invalidated
  e[0]; // gives error again

  /*
    use 'envp' somehow here
  */
}

Fixed docs.

Attaching results of CodeChecker run on some projects.

Removed code repetition from the tests.

NoQ added a comment.Mar 3 2021, 5:19 PM

CERT rules are typically very vague and don't map nicely into specific static analysis algorithms. A lot of CERT rules flag valid code as well as bugs. I want you to explain what *exactly* does the checker checks (say, in the form of a state machine, i.e. what sequences of events in the program does it find) and whether flagged code is expected to be always invalid.

// warnOnDeadSymbol reports 'envp' dead here

If it reports the symbol as dead after the warning would have been emitted then the problem must be elsewhere. There must be other differences in the analysis *before* the buggy statement. Nothing that happens after the buggy statement should ever matter.

Also the behavior of checkDeadSymbols looks correct here, I don't see any problems from your description. The symbol is reported dead when there's a guarantee that it won't be encountered anymore during analysis. If the parameter variable and the aliasing variable aren't used anymore and the symbol isn't stored anywhere else then the symbol is correctly marked as dead.

Why are you even tracking reg_$0<envp>? It's obvious from the structure of the symbol that it's an environment pointer, there's no need to keep it in maps.

clang/docs/analyzer/checkers.rst
2103–2104

It's very important to explain whether the "unanticipated behavior" is an entirely psychological concept ("most people don't understand how this works but this can occasionally also be a valid solution if you know what you're doing") or a 100% clear indication of a bug ("such code can never be correct", eg. it instantly causes undefined behavior, or the written value can never be read later so there's absolutely no point in writing it, or something like that).

@NoQ, thanks for the comments.

In D97699#2601804, @NoQ wrote:

... and whether flagged code is expected to be always invalid.

C standard says "may be overwritten", so I guess it's undefined behavior.

In D97699#2601804, @NoQ wrote:

... I want you to explain what *exactly* does the checker checks ...

There are two problems that are checked.
First: if we have 'envp' argument of main present and we modify the environment in some way (say setenv, putenv...), it "may" reallocate the whole environment memory, but 'envp' will still point to the old location.
Second: if we have a pointer pointing to the result of 'getenv' (and some other non-reentrant functions, there are 5 of them implemented now, but can be easily extended to others, since checker is general enough) and we call 'getenv' again the previous result "may" be overwritten.
The idea of the checker is simple and somewhat comparable to MallocChecker or UseAfterFree (in a sense that 'free' invalidates region). We have a map of previous function call results and after a subsequent call, we invalidate the previous region. And we warn on dereference of such pointer or if we have it as an argument to function call (to avoid some false negatives), which is not inlined.
I will add a description in a checker file.

And about checkDeadSymbols, I get your point, but I am interested why checker has different behavior on these two examples:

int main(int argc, char **argv, char *envp[]) {
  putenv((char*) "NAME=VALUE"); // envp invalidated
  envp[0]; // gives error
}
int main(int argc, char **argv, char *envp[]) {
  char **e = envp;
  putenv((char*) "NAME=VALUE"); // envp invalidated
  e[0]; // does not give error
}

If I manually force keeping envp region in state when I check if (!SymReaper.isLiveRegion(E)) { , then it reports as expected.

In D97699#2601804, @NoQ wrote:

Why are you even tracking reg_$0<envp>? It's obvious from the structure of the symbol that it's an environment pointer, there's no need to keep it in maps.

envp is confusable with argv: int main(int argc, char *argv[], char *envp[]) so we obtain envp's region from main() to match it later.

clang/docs/analyzer/checkers.rst
2103–2104

The standard terminology is very vague, like that:

The getenv function returns a pointer to a string associated with the matched list member. The string pointed to shall not be modified by the program but may be overwritten by a subsequent call to the getenv function.

What the hell. 😕 I think it is about table-doubling and reallocating the entire environment pointer table at some point which makes sense in case of the non-getter function calls. For the getters I think another processes could overwrite the environ pointer between two getter calls and problem could occur because of such table-doubling. To resolve the issue we need to call strdup() and create a copy of the current environment entry instead of having a pointer to it as I see.

@zukatsinadze it would be great to see a real reference with real issues in real world software. Is the true evil the multiple chained getter calls?

zukatsinadze added inline comments.Sun, Mar 21, 5:16 PM
clang/docs/analyzer/checkers.rst
2103–2104

it would be great to see a real reference with real issues in real world software

I've attached some results up in the thread. Checker gave several valuable reports on several projects if that's what you mean.

Is the true evil the multiple chained getter calls?

Besides SEI CERT, there are many other reputable resources stating the same problem with getenv.

https://www.ibm.com/support/knowledgecenter/SSLTBW_2.2.0/com.ibm.zos.v2r2.bpxbd00/getenv.htm
https://www.keil.com/support/man/docs/armlib/armlib_chr1359122850667.htm
https://pubs.opengroup.org/onlinepubs/009696799/functions/getenv.html
https://pubs.opengroup.org/onlinepubs/7908799/xsh/getenv.html
https://www.gnu.org/software/libc/manual/html_node/Environment-Access.html
https://man7.org/linux/man-pages/man3/getenv.3p.html

zukatsinadze added inline comments.Sun, Mar 21, 5:49 PM
clang/docs/analyzer/checkers.rst
2103–2104

Actual problem is that getenv is returning a pointer to its internal static buffer instead of giving pointer to environ, that's why the string will change with a subsequent call.

zukatsinadze edited the summary of this revision. (Show Details)

Gentle ping

  • Diff with context
  • Added some more tests
  • Updated documentation