Functions fread and fwrite are now modeled with the possible
return values and stream error flags on error.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
151–157 | If we add methods to FnDescription, we might as well add this as well, but I don't insist. | |
205–206 | C'99 standard §7.19.9.4.3, about the ftell function:
So we need an evalCall for this. | |
489 | Isn't the state change redundant? We have a preCall to this function and we assert this as well. |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
205–206 | I mean, this function can only fail if the stream itself is an erroneous or closed state, right? So we can associate the -1 return value with the stream being in that, and the other with the stream being opened. |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
205–206 | The ftell is part of a later patch that is adding ftell (and probably other functions). Only the "PossibleErrors" was updated in this patch because its meaning was changed (it contains value even if only one error kind is possible, before it was used only if at least two errors are possible). | |
489 | We need to clarify what file operations are valid if the stream is in failed state or in EOF state. Again the question is if the function will simply fail again or it does not fail but does wrong things, or it will work correctly. Currently the initial stream error state is not taken into account for fread and fwrite, this needs to be fixed. For example for fread: "If an error occurs, the resulting value of the file position indicator for the stream is indeterminate." So at least a next read or write or ftell can not work correct after this (or works but with unusable result). |
Finally I had to make the decision to remove the ErrorKindTy enum and use boolean flags instead for every possible error (including no error). This is needed because we do not know always what error is possible if it is "unknown". It could be determined from the last operation but now not any more. The documentation for fread says that if 0 is passed as the size or count argument the function returns zero and does not change the state. So the error state before fread remains active. The new design is to have bool values for every error kind (feof and ferror and a no-error state) so multiple can be active in a state to indicate that more than one error state is possible. If no-error or feof is possible these flags are turned on. If we need to know in such a state if the stream is in EOF a state split is needed to handle both cases (one with EOF and one with no-error). This split must be done in the pre-call handler, for example if we want a warning that the operation is not safe to use in EOF state. (But in this case really no split is needed, only clear the EOF flag and make a warning.)
We can have another approach if we do not set return values of the functions at all, instead save the symbol of the function and determine the returned value later from the constraints actually applied on it. This may save state splits, but only until a condition is encountered that checks for the function's return value.
int rc = fread(buf, size, count, fp); if (rc < count) { int eof = feof(fp); }
In this code if the fread has no concrete value set, the if would result in state split to rc < count and rc >= count. If the fread sets the return value the same split is done at fread but not at the if. If there is no such if or only later, this saves some state splits. In the first case the possible error state of the stream may be hard to determine because it depends on the things that were done before the fread call too. (We can save the symbol of fread and count and ask at a later point if the condition fread < count is true to get if fread has failed. But need to know additionally if count or size was zero and if yes, use a previous error state.)
- Stream error is stored in separate boolean flags instead of enum.
- Removed PossibleErrors.
- Added "FilePositionIndeterminate".
- Updated fread, fwrite, fseek, clearerr.
- Added tests.
I gave a lot of thought to the high level logic, but I need some time to really internalize whats happening here.
I guess one solution would be to have a history of last operations on the stream, but overall, it makes sense to have a make a shift to calculate the possible errors as we're evaluating the functions. Great idea!
We can have another approach if we do not set return values of the functions at all, instead save the symbol of the function and determine the returned value later from the constraints actually applied on it. This may save state splits, but only until a condition is encountered that checks for the function's return value.
Or any stream modifying operation. If we don't have branches for the return value, that is almost a bug in itself. Also, digging the return value out of a branch condition may be difficult (see ReturnValueChecker). Lets stick with the state splits at the function call for now.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
37 | We're not describing the error state of a stream here, but rather possible error states, so the name should reflect that. | |
205–206 | In any case, this should be removed from the patch, because adding it seems to be an orthogonal change to this one. | |
207–210 | What does this mean? I guess not the possible states after an fread call, but rather the possible states after a failed fread call, but then... | |
225–228 | ...this doesn't make much sense, feof doesn't cause and error, it merely detects it. | |
478 | Will that be the next patch? :D Eager to see it! | |
512 | castAs doesn't return an Optional. | |
533–539 | This is where I'd really like to see a precise quote from the standard. C'99 standard, §7.19.8.1.3, the return value of fread: The fread function returns the number of elements successfully read, which may be |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
207–210 | Name of the parameter may be misleading, for evalFreadFwrite the last argument describes the new state after the operation has failed. | |
225–228 | At evalFeofFerror the last parameter simply indicates if it is the feof or the ferror case. | |
478 | It may be too big change to add it as well here (or cause more difficulties). But it should be a check for ErrorFEof in the ErrorState (and add another bug type). This is not a fatal error. It may be more difficult to make fread return the feof error again if it starts already with feof state. (Because the state with FEof should be split from the generic error state that can contain other possible errors.) |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
478 |
Oh, yea, right, we went over this once :) What I really meant, is an EOF related error, like reading a variable with an EOF value in any context that isn't a comparison to the actual EOF (which in many contexts still isn't a fatal error, but is far more an indicator of code smell). But I'm just thinking aloud, please proceed with this project however it is convenient to you!
I gave this plenty of thought, how do you imagine the problem of us not splitting up to 3 states to show up? Suppose we're calling fread on a stream where the error state is either EOF or ERROR, but we don't know which. We could just leave the StreamErrorState as-is, couldn't we? |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
37 | I do not think that StreamPossibleErrorState is better, it is anyway a state that can contain any information. It is an error related state even if it defines not one exact error. | |
151–157 | Now this is again the only function that could be member, in the current form the assert can be done but not in the member form. |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
478 |
This could be another checker, or it can be integrated somehow into ErrorReturnChecker (that does something similar already). I mean, remembering if a variable contains EOF that comes from a function that may return EOF (otherwise the program can do nothing with a simple -1 numerical value without getting the warning), then if any other thing is done with it than comparison to EOF (or passing it to other function) it can be an error case. | |
478 |
|
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
478 |
Is it ever possible that an fread on a stream in FERROR returns non-zero and changes the streams state? Lets unpack this. §7.19.8.1.2 states that
§7.19.7.1.3 talks about the return value of fgetc:
Now, speaking of fgetc, its true that the standard doesn't specifically talk about what happens if the stream is already in error state, yet it does mention what happens if it is in eof-state, but then again, I guess its strongly implied that if the stream is an error state, a read error will occur right away, and the error state is preserved as specified. If fread is little more then sequential calls to fgetc, I would believe that a stream is in FERROR, FERROR will be preserved, because fgetc wouldn't be able to read a single character. Additionally, as mentioned in another inline, §7.19.8.1.3 talks about the return value of fread:
so that should be 0. To conclude, unless I'm wrong (which is totally possible, I'm just throwing standard cites around wanting to prove how I hope these things work), both types of stream errors would be preserved after a call to fread, and the return value would always be zero. |
Pre-statement checks for fread and fwrite filter out the cases when the position is "indeterminate", because in those states it is fatal error to call the function. Otherwise if not only the EOF flag is set initially (at start of fread) in ErrorState the fread should not generate state when other errors are possible (or none) (this contains an execution where initially EOF is set, after the function no EOF is set or FERROR is set). So the EOF initial state must be handled separately, probably split from the rest of the error state.
I am not sure if this FEOF thing is such a big problem. The modeling is not exact but practically this does not cause big problems and still better than before this patch. The problem is, if a fread is made and the stream is already in EOF state the (modeled) fread may still succeed or fail again with other error, according to the current way of modeling it. The calling program must handle still every possible error after fread regardless what error was there before. I am not sure if false positives can appear because this issue.
I'm sorry for the late review -- please know that this isn't the first time me taking a look, this is a complex issue. I find navigating your phabricator comments a bit difficult, it would help a lot if you would accompany them with a lot of code examples, and a lot more tests in the actual patch.
Error handling is super annoying with streams. Take a look at this:
#include "stdio.h" int main() { FILE *F = fopen("test.cpp", "r"); putc('c', F); printf("error: %i\n", ferror(F)); char Buf[1024]; const int READ_COUNT = 5; if (READ_COUNT != fread(Buf, sizeof(char), READ_COUNT, F)) printf("eof: %i\n", ferror(F)); else printf("%s\n", Buf); fclose(F); }
$ clang++ test.cpp && ./a.out error: 1 #incl
Should we allow this? It seems like the error flag was set, but the actual problem is unrelated to the read, so the read still works fine.
I feel like we're constantly changing what we're shooting for, and that implies immediate code changes. Let's take a step back, draw a state machine (no error, feof, ferror, indeterminate state), and just define what is an error, what is a code smell, what is perfectly fine, because we don't seem to be on the clear on this. We need to follow and agree on the C standard in a very literal sense, and we're not there just yet. The amount and placement of the state splits seem to cause a lot of confusion and we're not ready for that discussion just yet.
D70470 is a great example for a patch with a state machine.
I read through this many times but I just don't understand what you mean, could you rephrase this please?
Is there a significant gain behind doing that instead of doing a state split immediately? It feels unnatural to delay the split. It doesn't accurately represent how the code would run. Are we supposed to do this at every precall? What about other checkers that may rely on this one?
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
37 | Alright, you convinced me. | |
105 | The standard suggests that this happens on FERROR, not FEOF.
It would be wild if the stream wouldn't set the FEOF flag after reaching the end of the file. Also, I would imagine FEOF being usually implemented with the file position indicator. $ cat test.cpp` #include "stdio.h" int main() { FILE *F = fopen("test.cpp", "r"); char Buf[1024]; const int READ_COUNT = 99999; if (READ_COUNT != fread(Buf, sizeof(char), READ_COUNT, F)) { printf("%i\n", feof(F)); } } $ build/bin/clang++ test.cpp && ./a.out 1 Operations on an EOF and indeterminate stream are very different. |
I think this is a expected way of how it works, failure of a stream operation does not necessarily depend on result of a previous operation. So any operation can fail or not, independent of the previous error state (the "ferror" may happen because a temporary disk error and it is worth to retry the read, at least a non-fread function where the "indeterminate file position" problem does not happen). This would make things more simple. Only the EOF is probably exception, in EOF state any read operation fails with EOF. (But I do not know if it is possible that a stream in EOF state becomes non-EOF because new data appears at the end, if other program can write into it or it is some form of "pipe" or special file. If this is true even EOF needs no special handling.)
The intent is to model the fread-fwrite function failure by returning the error value and set the stream into error state. The error state is a composite of ferror and feof. The questions are now, at what case do these functions fail and with what error type?
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
105 | I wanted to say in that comment that FilePositionIndeterminate should be ignored if the ErrorState is ErrorFEof. In other words the file is never indeterminate if in EOF state. |
Right. But, speaking about generality, would it be reasonably to assume that all (or almost all) operations may also result in an indeterminate file indicator? Because if so, we could, and totally should strip those changes from the fread/fwrite ones and make a new revision.
That is by the way true for a lot of other changes, such as the delayed state split or the new error state struct. I mean not to burden you by micromanaging a lot of small patches, but it makes reviewing far faster for me.
(But I do not know if it is possible that a stream in EOF state becomes non-EOF because new data appears at the end, if other program can write into it or it is some form of "pipe" or special file. If this is true even EOF needs no special handling.)
That is a great question, but I strongly believe that it cannot without freopen. I mean, if I were to create an io library, I would set a pointer to the beginning and the end of the file, and know we hit eof is the current pointer hits the end point. Besides, how could anyone change the FILE * object in a non-analyzable way?
In any case, its a reasonable to assume it won't change.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
893–895 | Since this isn't a bug but rather a code smell, a note would be a nice compliment. |
The problem with small changes is that still we (at least I who writes the code) should know the final goal (and this can be hard if I have multiple not related problems to work on and everything goes forward by little pieces spread out in a long time). (Also the reviewer needs to know why a change is made if it is not obvious because it is part of a bigger to-be-added "logic".) Otherwise something already done can turn out to be wrong later, as is the case already now with these changes. My goal was to add the file handling functions one by one and adjust the existing code to work with the current implemented functions (probably something already added must be changed to adapt to the new conditions).
The indeterminate position is mentioned only at fread and fwrite. I do not know if it is reasonable to make the indeterminate position in other cases. Th indeterminate position is separate from error flags because of "clearerr" that clears the error flag but not the indeterminate position (if this is the right way of how it works), and because even ferror does not imply an indeterminate position always like after a write to a file opened in read mode. Then it is a detail question after what functions to set indeterminate position and before what functions check for it, the support for it is still needed. The current way of its handling seems correct to me, indeterminate position is not allowed before read or write but no problem before fseek (at least with an absolute seek value, this is not checked now).
We can assume that size of a file does not change by external (to the analyzed program) events so the EOF state can not disappear. (Is this true for stdin, or it has never an EOF?)
Adding a state chart may help something but does not tell the full picture because conditions on the transitions are not trivial to indicate and follow. And the state chart is too complex to draw it in ASCII-art format.
Created new revisions for parts of this change and a bit improved: D80009 and others in "Stack".
Is there a significant gain behind doing that instead of doing a state split immediately? It feels unnatural to delay the split. It doesn't accurately represent how the code would run. Are we supposed to do this at every precall? What about other checkers that may rely on this one?
The actual gain is that we somewhat reduce the exponential growth of the exploded graph. With such a Schrödinger's cat pattern we delay it to the point where it is actually checked, it checked at all.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
39 | bool NoError:1 = true etc. Why not use bit fields for a struct of boolans? |
We're not describing the error state of a stream here, but rather possible error states, so the name should reflect that.