This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][StreamChecker] Added evaluation of fread and fwrite.
AbandonedPublic

Authored by balazske on Apr 17 2020, 8:05 AM.

Details

Summary

Functions fread and fwrite are now modeled with the possible
return values and stream error flags on error.

Diff Detail

Event Timeline

balazske created this revision.Apr 17 2020, 8:05 AM
Szelethus requested changes to this revision.Apr 19 2020, 10:31 AM
Szelethus added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
152–158

If we add methods to FnDescription, we might as well add this as well, but I don't insist.

222

C'99 standard §7.19.9.4.3, about the ftell function:

If successful, the ftell function returns the current value of the file position indicator for the stream. On failure, the ftell function returns -1L and stores an implementation-defined positive value in errno.

So we need an evalCall for this.

485

Isn't the state change redundant? We have a preCall to this function and we assert this as well.

This revision now requires changes to proceed.Apr 19 2020, 10:31 AM
Szelethus added inline comments.Apr 19 2020, 10:34 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
222

I mean, this function can only fail if the stream itself is an erroneous or closed state, right? So we can associate the -1 return value with the stream being in that, and the other with the stream being opened.

balazske marked 2 inline comments as done.Apr 20 2020, 12:06 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
222

The ftell is part of a later patch that is adding ftell (and probably other functions). Only the "PossibleErrors" was updated in this patch because its meaning was changed (it contains value even if only one error kind is possible, before it was used only if at least two errors are possible).
It is not sure how the file operations work, maybe the ftell needs access to some data that is not available at the moment (and can fail even if the stream was not OK). If the stream is already in failed state the question is: Do ftell fail (then the we can make it return -1) or it does not fail but gives an unusable value (then generate a checker warning and stop the analysis).

485

We need to clarify what file operations are valid if the stream is in failed state or in EOF state. Again the question is if the function will simply fail again or it does not fail but does wrong things, or it will work correctly. Currently the initial stream error state is not taken into account for fread and fwrite, this needs to be fixed. For example for fread: "If an error occurs, the resulting value of the file position indicator for the stream is indeterminate." So at least a next read or write or ftell can not work correct after this (or works but with unusable result).

Finally I had to make the decision to remove the ErrorKindTy enum and use boolean flags instead for every possible error (including no error). This is needed because we do not know always what error is possible if it is "unknown". It could be determined from the last operation but now not any more. The documentation for fread says that if 0 is passed as the size or count argument the function returns zero and does not change the state. So the error state before fread remains active. The new design is to have bool values for every error kind (feof and ferror and a no-error state) so multiple can be active in a state to indicate that more than one error state is possible. If no-error or feof is possible these flags are turned on. If we need to know in such a state if the stream is in EOF a state split is needed to handle both cases (one with EOF and one with no-error). This split must be done in the pre-call handler, for example if we want a warning that the operation is not safe to use in EOF state. (But in this case really no split is needed, only clear the EOF flag and make a warning.)

We can have another approach if we do not set return values of the functions at all, instead save the symbol of the function and determine the returned value later from the constraints actually applied on it. This may save state splits, but only until a condition is encountered that checks for the function's return value.

int rc = fread(buf, size, count, fp);
if (rc  < count) {
  int eof = feof(fp);
}

In this code if the fread has no concrete value set, the if would result in state split to rc < count and rc >= count. If the fread sets the return value the same split is done at fread but not at the if. If there is no such if or only later, this saves some state splits. In the first case the possible error state of the stream may be hard to determine because it depends on the things that were done before the fread call too. (We can save the symbol of fread and count and ask at a later point if the condition fread < count is true to get if fread has failed. But need to know additionally if count or size was zero and if yes, use a previous error state.)

balazske updated this revision to Diff 259491.Apr 23 2020, 12:45 AM
  • Stream error is stored in separate boolean flags instead of enum.
  • Removed PossibleErrors.
  • Added "FilePositionIndeterminate".
  • Updated fread, fwrite, fseek, clearerr.
  • Added tests.

I gave a lot of thought to the high level logic, but I need some time to really internalize whats happening here.

Finally I had to make the decision to remove the ErrorKindTy enum and use boolean flags instead for every possible error (including no error). This is needed because we do not know always what error is possible if it is "unknown". It could be determined from the last operation but now not any more. The documentation for fread says that if 0 is passed as the size or count argument the function returns zero and does not change the state. So the error state before fread remains active. The new design is to have bool values for every error kind (feof and ferror and a no-error state) so multiple can be active in a state to indicate that more than one error state is possible. If no-error or feof is possible these flags are turned on. If we need to know in such a state if the stream is in EOF a state split is needed to handle both cases (one with EOF and one with no-error). This split must be done in the pre-call handler, for example if we want a warning that the operation is not safe to use in EOF state. (But in this case really no split is needed, only clear the EOF flag and make a warning.)

I guess one solution would be to have a history of last operations on the stream, but overall, it makes sense to have a make a shift to calculate the possible errors as we're evaluating the functions. Great idea!

We can have another approach if we do not set return values of the functions at all, instead save the symbol of the function and determine the returned value later from the constraints actually applied on it. This may save state splits, but only until a condition is encountered that checks for the function's return value.

Or any stream modifying operation. If we don't have branches for the return value, that is almost a bug in itself. Also, digging the return value out of a branch condition may be difficult (see ReturnValueChecker). Lets stick with the state splits at the function call for now.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
35

We're not describing the error state of a stream here, but rather possible error states, so the name should reflect that.

222

In any case, this should be removed from the patch, because adding it seems to be an orthogonal change to this one.

227–230

What does this mean? I guess not the possible states after an fread call, but rather the possible states after a failed fread call, but then...

237–245

...this doesn't make much sense, feof doesn't cause and error, it merely detects it.

474

Will that be the next patch? :D Eager to see it!

508

castAs doesn't return an Optional.

529–535

This is where I'd really like to see a precise quote from the standard.

C'99 standard, §7.19.8.1.3, the return value of fread:

The fread function returns the number of elements successfully read, which may be
less than nmemb if a read error or end-of-file is encountered. If size or nmemb is zero,
fread returns zero and the contents of the array and the state of the stream remain
unchanged.

balazske marked 3 inline comments as done.Apr 29 2020, 3:00 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
227–230

Name of the parameter may be misleading, for evalFreadFwrite the last argument describes the new state after the operation has failed.

237–245

At evalFeofFerror the last parameter simply indicates if it is the feof or the ferror case.

474

It may be too big change to add it as well here (or cause more difficulties). But it should be a check for ErrorFEof in the ErrorState (and add another bug type). This is not a fatal error.

It may be more difficult to make fread return the feof error again if it starts already with feof state. (Because the state with FEof should be split from the generic error state that can contain other possible errors.)

balazske updated this revision to Diff 262058.May 5 2020, 4:02 AM
balazske marked 2 inline comments as done.

Small review related fixes.

Szelethus added inline comments.May 5 2020, 12:23 PM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
474

It may be too big change to add it as well here (or cause more difficulties). But it should be a check for ErrorFEof in the ErrorState (and add another bug type). This is not a fatal error.

Oh, yea, right, we went over this once :) What I really meant, is an EOF related error, like reading a variable with an EOF value in any context that isn't a comparison to the actual EOF (which in many contexts still isn't a fatal error, but is far more an indicator of code smell). But I'm just thinking aloud, please proceed with this project however it is convenient to you!

It may be more difficult to make fread return the feof error again if it starts already with feof state. (Because the state with FEof should be split from the generic error state that can contain other possible errors.)

I gave this plenty of thought, how do you imagine the problem of us not splitting up to 3 states to show up? Suppose we're calling fread on a stream where the error state is either EOF or ERROR, but we don't know which. We could just leave the StreamErrorState as-is, couldn't we?

balazske added inline comments.May 6 2020, 3:09 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
35

I do not think that StreamPossibleErrorState is better, it is anyway a state that can contain any information. It is an error related state even if it defines not one exact error.

152–158

Now this is again the only function that could be member, in the current form the assert can be done but not in the member form.

balazske marked 2 inline comments as done.May 6 2020, 3:28 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
474

reading a variable with an EOF value in any context that isn't a comparison to the actual EOF

This could be another checker, or it can be integrated somehow into ErrorReturnChecker (that does something similar already). I mean, remembering if a variable contains EOF that comes from a function that may return EOF (otherwise the program can do nothing with a simple -1 numerical value without getting the warning), then if any other thing is done with it than comparison to EOF (or passing it to other function) it can be an error case.

474

Suppose we're calling fread on a stream where the error state is either EOF or ERROR, but we don't know which. We could just leave the StreamErrorState as-is, couldn't we?

  • If fread is called with FEOF flag, it returns 0 and FEOF remains.
  • If fread is called with FERROR flag, it may fail (with any error) or not. This is similar as calling fread in no-error state.
Szelethus added inline comments.May 6 2020, 4:59 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
474
  • If fread is called with FERROR flag, it may fail (with any error) or not. This is similar as calling fread in no-error state.

Is it ever possible that an fread on a stream in FERROR returns non-zero and changes the streams state? Lets unpack this.

§7.19.8.1.2 states that

The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fgetc function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read. If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate.

§7.19.7.1.3 talks about the return value of fgetc:

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the end-of-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

Now, speaking of fgetc, its true that the standard doesn't specifically talk about what happens if the stream is already in error state, yet it does mention what happens if it is in eof-state, but then again, I guess its strongly implied that if the stream is an error state, a read error will occur right away, and the error state is preserved as specified.

If fread is little more then sequential calls to fgetc, I would believe that a stream is in FERROR, FERROR will be preserved, because fgetc wouldn't be able to read a single character. Additionally, as mentioned in another inline, §7.19.8.1.3 talks about the return value of fread:

The fread function returns the number of elements successfully read, which may be less than nmemb if a read error or end-of-file is encountered. If size or nmemb is zero, fread returns zero and the contents of the array and the state of the stream remain unchanged.

so that should be 0.

To conclude, unless I'm wrong (which is totally possible, I'm just throwing standard cites around wanting to prove how I hope these things work), both types of stream errors would be preserved after a call to fread, and the return value would always be zero.

balazske added a comment.EditedMay 6 2020, 10:49 AM

Pre-statement checks for fread and fwrite filter out the cases when the position is "indeterminate", because in those states it is fatal error to call the function. Otherwise if not only the EOF flag is set initially (at start of fread) in ErrorState the fread should not generate state when other errors are possible (or none) (this contains an execution where initially EOF is set, after the function no EOF is set or FERROR is set). So the EOF initial state must be handled separately, probably split from the rest of the error state.

I am not sure if this FEOF thing is such a big problem. The modeling is not exact but practically this does not cause big problems and still better than before this patch. The problem is, if a fread is made and the stream is already in EOF state the (modeled) fread may still succeed or fail again with other error, according to the current way of modeling it. The calling program must handle still every possible error after fread regardless what error was there before. I am not sure if false positives can appear because this issue.

balazske updated this revision to Diff 263447.May 12 2020, 8:39 AM

Added state split before fread to make warning for error state possible.

balazske marked 2 inline comments as done.May 12 2020, 8:50 AM
Szelethus requested changes to this revision.May 13 2020, 4:58 AM

I'm sorry for the late review -- please know that this isn't the first time me taking a look, this is a complex issue. I find navigating your phabricator comments a bit difficult, it would help a lot if you would accompany them with a lot of code examples, and a lot more tests in the actual patch.

Error handling is super annoying with streams. Take a look at this:

#include "stdio.h"

int main() {
  FILE *F = fopen("test.cpp", "r");
  putc('c', F);
  printf("error: %i\n", ferror(F));
  char Buf[1024];
  const int READ_COUNT = 5;
  if (READ_COUNT != fread(Buf, sizeof(char), READ_COUNT, F))
    printf("eof: %i\n", ferror(F));
  else
    printf("%s\n", Buf);
  fclose(F);
}
$ clang++ test.cpp && ./a.out 
error: 1
#incl

Should we allow this? It seems like the error flag was set, but the actual problem is unrelated to the read, so the read still works fine.

I feel like we're constantly changing what we're shooting for, and that implies immediate code changes. Let's take a step back, draw a state machine (no error, feof, ferror, indeterminate state), and just define what is an error, what is a code smell, what is perfectly fine, because we don't seem to be on the clear on this. We need to follow and agree on the C standard in a very literal sense, and we're not there just yet. The amount and placement of the state splits seem to cause a lot of confusion and we're not ready for that discussion just yet.

D70470 is a great example for a patch with a state machine.

Pre-statement checks for fread and fwrite filter out the cases when the position is "indeterminate", because in those states it is fatal error to call the function. Otherwise if not only the EOF flag is set initially (at start of fread) in ErrorState the fread should not generate state when other errors are possible (or none) (this contains an execution where initially EOF is set, after the function no EOF is set or FERROR is set). So the EOF initial state must be handled separately, probably split from the rest of the error state.

I read through this many times but I just don't understand what you mean, could you rephrase this please?

Added state split before fread to make warning for error state possible.

Is there a significant gain behind doing that instead of doing a state split immediately? It feels unnatural to delay the split. It doesn't accurately represent how the code would run. Are we supposed to do this at every precall? What about other checkers that may rely on this one?

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
35

Alright, you convinced me.

68

The standard suggests that this happens on FERROR, not FEOF.

If an error occurs, the resulting value of the file position indicator for the stream is indeterminate. If a partial element is read, its value is indeterminate.

It would be wild if the stream wouldn't set the FEOF flag after reaching the end of the file. Also, I would imagine FEOF being usually implemented with the file position indicator.

$ cat test.cpp`
#include "stdio.h"

int main() {
  FILE *F = fopen("test.cpp", "r");
  char Buf[1024];
  const int READ_COUNT = 99999;
  if (READ_COUNT != fread(Buf, sizeof(char), READ_COUNT, F)) {
    printf("%i\n", feof(F));
  }
}
$ build/bin/clang++ test.cpp && ./a.out 
1

Operations on an EOF and indeterminate stream are very different.

This revision now requires changes to proceed.May 13 2020, 4:58 AM

I think this is a expected way of how it works, failure of a stream operation does not necessarily depend on result of a previous operation. So any operation can fail or not, independent of the previous error state (the "ferror" may happen because a temporary disk error and it is worth to retry the read, at least a non-fread function where the "indeterminate file position" problem does not happen). This would make things more simple. Only the EOF is probably exception, in EOF state any read operation fails with EOF. (But I do not know if it is possible that a stream in EOF state becomes non-EOF because new data appears at the end, if other program can write into it or it is some form of "pipe" or special file. If this is true even EOF needs no special handling.)

balazske marked 2 inline comments as done.May 13 2020, 8:52 AM

The intent is to model the fread-fwrite function failure by returning the error value and set the stream into error state. The error state is a composite of ferror and feof. The questions are now, at what case do these functions fail and with what error type?

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
68

I wanted to say in that comment that FilePositionIndeterminate should be ignored if the ErrorState is ErrorFEof. In other words the file is never indeterminate if in EOF state.

I think this is a expected way of how it works, failure of a stream operation does not necessarily depend on result of a previous operation. So any operation can fail or not, independent of the previous error state (the "ferror" may happen because a temporary disk error and it is worth to retry the read, at least a non-fread function where the "indeterminate file position" problem does not happen). This would make things more simple. Only the EOF is probably exception, in EOF state any read operation fails with EOF.

Right. But, speaking about generality, would it be reasonably to assume that all (or almost all) operations may also result in an indeterminate file indicator? Because if so, we could, and totally should strip those changes from the fread/fwrite ones and make a new revision.

That is by the way true for a lot of other changes, such as the delayed state split or the new error state struct. I mean not to burden you by micromanaging a lot of small patches, but it makes reviewing far faster for me.

(But I do not know if it is possible that a stream in EOF state becomes non-EOF because new data appears at the end, if other program can write into it or it is some form of "pipe" or special file. If this is true even EOF needs no special handling.)

That is a great question, but I strongly believe that it cannot without freopen. I mean, if I were to create an io library, I would set a pointer to the beginning and the end of the file, and know we hit eof is the current pointer hits the end point. Besides, how could anyone change the FILE * object in a non-analyzable way?

In any case, its a reasonable to assume it won't change.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
802–804

Since this isn't a bug but rather a code smell, a note would be a nice compliment.
"Use clearerr() to clear the error flag"

The problem with small changes is that still we (at least I who writes the code) should know the final goal (and this can be hard if I have multiple not related problems to work on and everything goes forward by little pieces spread out in a long time). (Also the reviewer needs to know why a change is made if it is not obvious because it is part of a bigger to-be-added "logic".) Otherwise something already done can turn out to be wrong later, as is the case already now with these changes. My goal was to add the file handling functions one by one and adjust the existing code to work with the current implemented functions (probably something already added must be changed to adapt to the new conditions).

The indeterminate position is mentioned only at fread and fwrite. I do not know if it is reasonable to make the indeterminate position in other cases. Th indeterminate position is separate from error flags because of "clearerr" that clears the error flag but not the indeterminate position (if this is the right way of how it works), and because even ferror does not imply an indeterminate position always like after a write to a file opened in read mode. Then it is a detail question after what functions to set indeterminate position and before what functions check for it, the support for it is still needed. The current way of its handling seems correct to me, indeterminate position is not allowed before read or write but no problem before fseek (at least with an absolute seek value, this is not checked now).

balazske added a comment.EditedMay 14 2020, 12:04 AM

We can assume that size of a file does not change by external (to the analyzed program) events so the EOF state can not disappear. (Is this true for stdin, or it has never an EOF?)

Adding a state chart may help something but does not tell the full picture because conditions on the transitions are not trivial to indicate and follow. And the state chart is too complex to draw it in ASCII-art format.

Created new revisions for parts of this change and a bit improved: D80009 and others in "Stack".

Is there a significant gain behind doing that instead of doing a state split immediately? It feels unnatural to delay the split. It doesn't accurately represent how the code would run. Are we supposed to do this at every precall? What about other checkers that may rely on this one?

The actual gain is that we somewhat reduce the exponential growth of the exploded graph. With such a Schrödinger's cat pattern we delay it to the point where it is actually checked, it checked at all.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
37

bool NoError:1 = true etc. Why not use bit fields for a struct of boolans?

We can close this now, right?

balazske abandoned this revision.Jun 3 2020, 8:25 AM

Yes, closing it. (Will change StreamErrorState probably to bitfield later.)