This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Improved check of `fgetc` in StreamChecker.
AbandonedPublic

Authored by balazske on Feb 24 2020, 5:23 AM.

Details

Summary

A warning is produced if the getc call fails (returns EOF) and
the next called stream function is not feof and ferror
(both are needed).
This check is inspired by CERT rule FIO34-C "Distinguish between characters read from a file and EOF or WEOF".
https://wiki.sei.cmu.edu/confluence/display/c/FIO34-C.+Distinguish+between+characters+read+from+a+file+and+EOF+or+WEOF

Other similar functions (and wide versions) are to be added.

Event Timeline

balazske created this revision.Feb 24 2020, 5:23 AM
Herald added a project: Restricted Project. · View Herald Transcript
martong added inline comments.Feb 24 2020, 7:08 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
356

Maybe adding a comment like false /*Feof*/ could make this more readable. Even better, using two different strong types instead of the bool args could make it even more safe, but if that's too much hassle then it is may be not worth (I don't know if LLVM has a proper infrastructure for strong types).

clang/test/Analysis/stream.c
182

Do you have tests for getc as well? Maybe we could have at least one for getc too.

218

Perhaps Should call ferror is more informative for the readers of the tests.

Szelethus requested changes to this revision.Feb 24 2020, 7:16 AM

I'm a bit confused as to what this patch aims to do. Once again, I'd kindly ask you to discuss for a bit before updating this patch.


The rule states that after reaching EOF both ferror and feof should be called. I'm a bit confused here -- isn't this problem a property of EOF rather then getc() specifically? Also, expecting each maintainer of this code to call checkErrorState whenever a new stream function is implemented as well as keeping in mind that state transitions should be made against a new ExplodedNode makes me think that we could do better.

I think we should do some ground work before progressing further. The checker currently doesn't implement the state where we know that the stream is an error state (we got EOF from it). Shouldn't we do that first? After that, the rule you want to enforce may be more appropriate to originate from checkDeadSymbols when a stream in said state isn't checked properly, which seems to be a lot more in line with the rule.


The test file contains tests where EOF isn't encountered once but we're emitting a warning that ferror and feof should've been called, yet the rule states that

Calling both functions on each iteration of a loop adds significant overhead, so a good strategy is to temporarily trust EOF and WEOF within the loop but verify them with feof() and ferror() following the loop.

Could we have the test cases from the CERT site, or something similar with a loop?


One thing that worries me is compliant non-portable solution:

#include <assert.h>
#include <stdio.h>
#include <limits.h>
 
void func(void) {
  int c;
  static_assert(UCHAR_MAX < UINT_MAX, "FIO34-C violation");
 
  do {
    c = getchar();
  } while (c != EOF);
}

We would totally emit errors for this. There are some solutions to this:

  • Assume that the user will turn off this checker in this case.
  • Check the TranslationUnitDecl for static asserts. If not present, suggest to put it in. This would make this warning more appropriate to come from the portability package.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
255–261

I'm 99% sure that the Preprocessor is retrievable in the checker registry function (register*Checker), and this code should be moved there, and EOFv be made non-mutable.

clang/test/Analysis/stream.c
188

Are we sure that this is the error message we want to emit here? Sure, not checking whether the stream is in an error state is a recipe for disaster, but this seems to clash with

Should call 'feof' and 'ferror' after failed character read.

as error here is not even checking the return value. Shouldn't we say

Should check the return value of fgetc before ...

instead?

246–247

I bet this isn't how we envision how these function to be used. Maybe ReturnValueChecker could be deployed here? In any case, that would be a topic of a different revision.

This revision now requires changes to proceed.Feb 24 2020, 7:16 AM
balazske marked 4 inline comments as done.Feb 25 2020, 3:22 AM

An improvement can be made if the check runs only when size of char and int are equal. And it is possible to avoid the warning if fgetc is called after fgetc (with no functions in between). (More complicated thing is to test for loop construct and maybe not more efficient.)

The code can be improved somewhat but there are now too many special cases and there is already a default eval function. Probably it is better to have every "check" function take and return ExplodedNode instead of state.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
255–261

I am not sure if this would be good. The Preprocessor is not available easily and the "register" function is probably there to register a checker and do nothing else. Probably the source code is not even known when it is called. And how to pass the EOF value to the created checker in the register function (if not a static variable is used)?

clang/test/Analysis/stream.c
182

getc is not done yet. It can need more change in StreamChecker to handle functions on std streams (at least stdin).

188

No, it is not enough to check the return value of fgetc, exactly this is the reason for the checker. "Should call 'feof' and 'ferror' after failed character read." is more exact: The read was suspected to fail because EOF was returned but should call feof (...) to verify it. A better message can be:

If 'fgetc' returns EOF call 'feof' and 'ferror' to check for error.

246–247

If fgetc returns non-EOF we can be sure that there is no error, so no need to call ferror and feof. If it returns EOF we must "double-check" that it was a real error or an EOF-looking character that was read.

NoQ added a comment.Feb 25 2020, 3:36 AM

An improvement can be made if the check runs only when size of char and int are equal.

How many such platforms can you name? Please correct me if i'm wrong but it sounds like we're adding a lot of logic to maintain for a very very low-value check.

NoQ added a comment.Feb 25 2020, 3:38 AM
In D75045#1891055, @NoQ wrote:

How many such platforms can you name?

I mean, does Clang even support such targets?

In D75045#1891056, @NoQ wrote:
In D75045#1891055, @NoQ wrote:

How many such platforms can you name?

I mean, does Clang even support such targets?

The problem can occur with the "wide" variants of the getc functions when wchar_t is same width as wint_t, this can be more often (but I am not sure). Even then the character set should contain a character similar to WEOF, that is not true for UTF-16 and 32. So there may be low probability for the problem to occur. Is this reason to not have this check at all?

Szelethus added a comment.EditedFeb 25 2020, 4:39 AM
In D75045#1891056, @NoQ wrote:
In D75045#1891055, @NoQ wrote:

How many such platforms can you name?

I mean, does Clang even support such targets?

The problem can occur with the "wide" variants of the getc functions when wchar_t is same width as wint_t, this can be more often (but I am not sure). Even then the character set should contain a character similar to WEOF, that is not true for UTF-16 and 32. So there may be low probability for the problem to occur. Is this reason to not have this check at all?

If we can detect that the code is non-portable, it doesn't really matter in my opinion if we don't support the target on which the problem would occur. Also, I can't come up with a specific platform that would have an issue like this (maybe in a dishwasher? (: ), but CERT is known to create rules based on real vulnerabilities. A quick glance lead me to this page, though these vulnerabilities may not be directly related to this rule:

https://wiki.sei.cmu.edu/confluence/display/c/AA.+Bibliography#AA.Bibliography-ISO/IECTS17961-2013

An improvement can be made if the check runs only when size of char and int are equal.

Since the rule specifically states that the problem here is portability related, I think we shouldn't do that.

The code can be improved somewhat but there are now too many special cases and there is already a default eval function. Probably it is better to have every "check" function take and return ExplodedNode instead of state.

This really ties into my main concern, that this isn't a stream management function specific problem, but rather an unchecked stream problem after we know that it returns EOF. I think we should first emit warnings on operations on EOF-returning streams, and implement this rule after that. WDYT?

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
255–261

CheckerManager::registerChecker in fact returns the non-const checker object. The registry function is responsible for the registration and initialization of the checker object.

Though you have a point in the sense that this isn't terribly well defined (I should really finish D58065...).

I had a look, and my usual motto of "Any manager can be retrieved in clang at arbitrary locations if you try hard enough" seems to have been wrong. I don't see how I could get my hands on a Preprocessor in the checker registry function. So, feel free to disregard this comment.

clang/test/Analysis/stream.c
188

The compliant solution in the rule description has the following example:


#include <stdio.h>
 
void func(void) {
  int c;
 
  do {
    c = getchar();
  } while (c != EOF);
  if (feof(stdin)) {
    /* Handle end of file */
  } else if (ferror(stdin)) {
    /* Handle file error */
  } else {
    /* Received a character that resembles EOF; handle error */
  }
}

because if the character resembles EOF that is an error too, and also

Calling both functions on each iteration of a loop adds significant overhead, so a good strategy is to temporarily trust EOF and WEOF within the loop but verify them with feof() and ferror() following the loop.

Which makes me thing that the error here is not checking against EOF, first and foremost.

The warning message

If 'fgetc' returns EOF call 'feof' and 'ferror' to check for error.

sounds great, I would just prefer to see it after we know that the stream returned EOF.

246–247

Yea, but shouldn't we check the return values of ferror and feof? Otherwise whats the point?

balazske marked an inline comment as done.Feb 25 2020, 8:16 AM

So, I think a different approach can be to store the stream state for the stream instead of ShouldCallX variables. (The state can be no-error, eof-error, other-error.) Optionally a warning can be made if any (modeled) stream function is called and the stream is in error state (and handle feof and ferror specially). (This is optional because it is not necessary to check the error always, it should not cause crash or undefined behavior. If a function is called in already failed stream state, it should simply fail again, or even not. Retrying a read for example can be correct. To detect if there is any error checking in the code the "ErrorReturnChecker" in https://reviews.llvm.org/D72705 is the better solution.) A special case is if the last called function was getc and the stream is in error state, here we know that it is necessary to call ferror or feof.

clang/test/Analysis/stream.c
246–247

Yes but this is job of an other checker (return value not used). This here is to ensure that the functions are called at least.

clang/test/Analysis/stream.c
188

I agree. The checker should emit a warning if and only if the return value of fgetc() equals to EOF. Another checker should ensure that the return value of fgetc() is indeed compared to EOF. However: are we sure that there are no cases were no other error may happen than EOF thus we intentionally omit further checking? (Imagine a simple read from the keyboard in a command line utility.)

balazske planned changes to this revision.Aug 18 2020, 9:44 PM

Something like this can be added to StreamChecker but it has changed in many places so a new review would be needed. The main problem is with this fgetc check is that it would be good to check the standard stream version getc too, but that is hard to implement because global nature of the standard streams (stdin).

balazske abandoned this revision.Aug 3 2021, 1:09 AM

The idea of this change still applies but because checker and other code changes it is better to start again.