This is a way to handle stream error state in StreamChecker.
This is initial and work-in-progress,
only store of the error is implemented and create of error states
for function 'fseek'. This principle should work for other functions
and for testing if a function is called in error state.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
50–68 | The intention was that the error state is only stored when the stream is opened (in opened state). Additionally it exists in the map only if there is error, so no "NoError" kind is needed. This is only to save memory, if it is not relevant I can move the error information into StreamState (that will contain two enums then). | |
92–125 | This is an "interesting" solution for the problem that there is need for a function with 3 return values. The constructor performs the task of the function: Create a conjured value (and get the various objects for it). The output values are RetVal and RetSym, and the success state, and the call expr that is computed here anyway. It could be computed independently but if the value was retrieved once it is better to store it for later use. (I did not check how costly that operation is.) I had some experience that using only getReturnValue and make constraints on that does not work as intended, and the reason can be that we need to bind a value for the call expr otherwise it is an unknown (undefined?) value (and not the conjured symbol)? |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
50–68 | I personally wouldn't worry about memory consumption in this case too much, considering how much information needs to be store for simple expressions, stream objects are relatively few and far in between, even on projects that use them a lot. Having one more enum in StreamState would be better in this case then! :) | |
92–125 | I suspect that getReturnValue might only work in postCall, but I'm not sure. I think instead of this class, a function returning a std::tuple would be nicer, with std::tie on the call site. You seem to use all 3 returns values in the functions that instantiate MakeRetVal anyways :). In StdLibraryFunctionsChecker's evalCall, the return value is explicitly constructed, and further constraints on it are only imposed in postCall. I wonder why that is. @martong, any idea why we don't apply the constraints for pure functions in evalCall? | |
383 | According to the C'98 standard §7.19.9.2.5:
So it definitely doesn't clear the EOF flag on failure. |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
92–125 | The return value case is not as simple because the DefinedSVal has no default constructor, but it is not as bad to return only the RetVal and have a CE argument. | |
383 | Yes it does say nothing about what happens with EOF flag on failure, so it should be is better to not change it. And we do not know if it is possible to get an EOF error (seek to after the end of file?). |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
40–41 | Could you please move these to the individual enums please? :) I have an indexer that can query those as documentation. | |
46–49 | These too. Also, I'm not yet sure whether we need OtherError and AnyError, as stated in a later inline. | |
92–125 | I like the current solution very much! | |
369–371 | If we check in preCall whether the stream is opened why don't we conservatively assume it to be open? | |
387–390 | But why? The standard suggests that the error state isn't changed upon failure. I think we should leave it as-is. | |
472–473 | Right here, should we just assume a stream to be opened when we can't prove otherwise? ensureStreamOpened is only called when we are about to evaluate a function that assumes the stream to be opened anyways, so I don't expect many false positive lack of fclose reports. |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
369–371 | If we do that we get a resource leak error for example in the test function pr8081 (there is only a call to fseek). We can assume that if the function gets the file pointer as argument it does not "own" it so the close is not needed. Otherwise the false positive resource leaks come always in functions that take a file argument and use file operations. Or this case (the file pointer is passed as input argument, the file is not opened by the function that is analyzed) can be handled specially, we can assume that there is no error initially and closing the file is not needed. | |
387–390 | The fseek can fail and set the error flag, see the example code here: | |
472–473 | The warning is created only if we know that the stream is not opened. This function makes no difference if the stream is "tracked" (found in StreamMap) or not. In the not-tracked case it is the same as if it were opened. Probably the function can be changed to take a StreamState instead of StreamVal and the not-tracked case can be handled separately. Or this function can add the new StreamState in (opened state). |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
46–49 | I plan to use AnyError for failing functions that can have EOF and other error as result. At a later ferror or feof call a new state split follows to differentiate the error (instead of having to add 3 new states after the function, for success, EOF error and other error). If other operation is called we know anyway that some error happened. | |
369–371 | This can be done in a next change. It involves more changes at other places. I think of inserting the state for the stream if it was not there before. But we need to save if this was such an insert or a normal fopen (and do not report resource leak for the "insert" case). |
Okay, I think we're mostly in agreement so far -- could we implement a warning and add some test files for unchecked stream states after a failed fseek call?
The title of the revision is "[Analyzer][StreamChecker] Introduction of stream error state handling.", yet it is mostly about fseek, even if the intention is to demonstrate how error state handling would look like through its better modeling. How about we change the title to "[Analyzer][StreamChecker] Model fseek better by introducing stream error states"?
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
46–49 | I think it would still be better to introduce them as we find uses for them :) Also, to stay in the scope of this patch, shouldn't we only introduce FseekError? If we did, we could make warning messages a lot clearer. | |
369–371 |
Consider me convinced :) | |
387–390 | Yup, right, you won :) I tried some examples out on my system, and it could preserve or change the error state of the stream. To me it seems like that not checking the state of the stream after a failed fseek is surely unadvised. This still should be AnyError (or FseekError), as according to the OtherError's description OtherError may not refer to EOF, yet after a failed fseek call the stream can be in EOF: $ cat test.cpp #include <cstdio> int main() { FILE *F = fopen("test.cpp", "r"); while (EOF != fgetc(F)) {} if (feof(F)) printf("The file is closed!\n"); // Return value is non-zero on failure. if (fseek(F, -100000, SEEK_END)) { if (feof(F)) printf("The file is still closed!\n"); else printf("The file is no longer closed!\n"); } } $ clang test.cpp && ./a.out
The file is closed!
The file is still closed! |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
46–49 | This change is generally about introduction of the error handling, not specifically about fseek. Probably not fseek was the best choice, it could be any other function. Probably I can add another, or multiple ones, but then again the too-big-patch problem comes up. (If now the generic error states exist the diffs for adding more stream operations could be more simple by only adding new functions and not changing the StreamState at all.) (How is this related to warning messages?) | |
387–390 | After some experimenting I think it is best to have every possibility of errors after fseek. This means, either EOF, or other-error, or non-EOF and not other-error (but failed fseek call according to return value). So we need 1 success and 2 error return value states, one error return with AnyError and one with NoError (strange but happens according to the shown output). Probably there is relation between the previous state and the produced result of fseek but I do not want to figure out, it may be different in other systems and the documentations say nothing. This comes from this program: #include <stdio.h> void print_result(FILE *F, int rc, const char *errtxt) { printf("--------\n%s", errtxt); if (rc) { printf("failed...\n"); if (feof(F)) { printf("FEOF\n"); } if (ferror(F)) { printf("FERROR\n"); perror("error"); } } else { printf("success\n"); } } int main() { FILE *F = fopen("fseek.c", "r"); while (EOF != fgetc(F)) {} print_result(F, 1, "read done\n"); // Return value is non-zero on failure. int rc = fseek(F, -100000, SEEK_END); print_result(F, rc, "seek invalid\n"); rc = fseek(F, 2, SEEK_END); print_result(F, rc, "seek valid\n"); rc = fseek(F, -100000, SEEK_END); print_result(F, rc, "seek invalid\n"); fputs("str", F); print_result(F, 1, "failed operation\n"); rc = fseek(F, -100000, SEEK_END); print_result(F, rc, "seek invalid\n"); rc = fseek(F, -1, SEEK_END); print_result(F, rc, "seek valid\n"); rc = fseek(F, -100000, SEEK_END); print_result(F, rc, "seek invalid\n"); } And the result is: -------- read done failed... FEOF -------- seek invalid failed... FEOF -------- seek valid success -------- seek invalid failed... -------- failed operation failed... FERROR error: Bad file descriptor -------- seek invalid failed... FERROR error: Invalid argument -------- seek valid success -------- seek invalid failed... FERROR error: Invalid argument |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
46–49 | For now, the EofError and OtherError can be removed, in this change these are not used (according to how fseek will be done). |
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
46–49 |
You could not have picked a better function! Since the rules around the error state of the stream after a failed fseek call are quite complex, few functions deserve their own error state more.
I had the following image in my head, it could be used at the bug report emission site to give a precise description: if (SS->isInErrorState()) { switch(SS.getErrorKind) { case FseekError: reportBug("After a failed fseek call, the state of the stream may " "have changed, and it might be feof() or ferror()!"); break; case EofError: reportBug("The stream is in an feof() state!"); break; case Errorf: reportBug("The stream is in an ferror() state!"); break; case OtherError: // We don't know what the precise error is, but we surely // know its in one. reportBug("The stream is in an error state!"); break; }
For the last case in the code snippet (OtherError), I'm not too sure what the conditions are -- when do we know what the stream state is (some sort of an error), but not know precisely how? In the case of fseek, we don't precisely know what the state is but we know how it came about. I just don't yet see why we need a generic error state.
I think the point of the patch is to demonstrate the implementation of an error state, not to implement them all, and it does it quite well!
I agree! |
To avoid problems I created a new version of this diff too that follows after the other new ones:
https://reviews.llvm.org/D75682
(Adding a new diff can make inline comment positions even more inexact specially if the diff has many differences from an older one?)
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp | ||
---|---|---|
92–125 |
We could apply them in evalCall technically. |
I'd strongly prefer to finish this and move on after that -- we have the discussion here, and a great looking patch with only a few things to address.
I have "mirrored" all 3 changes in this stack to the new series in D75682. Probably it is possible to reuse these revisions instead but I do not know if it will not confuse phabricator somehow (and how phabricator behaves in such "tricky" cases, there is not a usable documentation for it). The D75682 is the one that should be used now, it is the same change as this one plus the ferror and feof functions and tests. The new part with ferror and feof can be done in a new change but these belong logically into this change to make the error handling complete and testable (this change in current form is not good for tests).
If this patch is supposed to be a followup to D75682, could you please mark it as such? I find these revisions difficult to navigate.
I have "mirrored" all 3 changes in this stack to the new series in D75682. Probably it is possible to reuse these revisions instead but I do not know if it will not confuse phabricator somehow (and how phabricator behaves in such "tricky" cases, there is not a usable documentation for it).
Since this is the patch where we held the discussion about error states, I think it would be better for this revision land first, that would also solve the problem of inlines being all over the place. It doesn't really matter whether we're introducing error states first through feof and ferror, or the admittedly quirky fseek. WDYT?
Could you please move these to the individual enums please? :) I have an indexer that can query those as documentation.