This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][StreamChecker] Introduction of stream error state handling.
AbandonedPublic

Authored by balazske on Feb 28 2020, 8:17 AM.

Details

Summary

This is a way to handle stream error state in StreamChecker.
This is initial and work-in-progress,
only store of the error is implemented and create of error states
for function 'fseek'. This principle should work for other functions
and for testing if a function is called in error state.

Diff Detail

Event Timeline

balazske created this revision.Feb 28 2020, 8:17 AM
Herald added a project: Restricted Project. · View Herald Transcript
Szelethus added inline comments.Mar 2 2020, 6:54 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
71–89

Shouldn't we merge this with StreamState?

93–126

Do you have other patches that really crave the need for this class? Why isn't CallEvent::getReturnValue sufficient? This is a legitimate question, I really don't know. :)

balazske marked 2 inline comments as done.Mar 2 2020, 7:18 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
71–89

The intention was that the error state is only stored when the stream is opened (in opened state). Additionally it exists in the map only if there is error, so no "NoError" kind is needed. This is only to save memory, if it is not relevant I can move the error information into StreamState (that will contain two enums then).

93–126

This is an "interesting" solution for the problem that there is need for a function with 3 return values. The constructor performs the task of the function: Create a conjured value (and get the various objects for it). The output values are RetVal and RetSym, and the success state, and the call expr that is computed here anyway. It could be computed independently but if the value was retrieved once it is better to store it for later use. (I did not check how costly that operation is.)

I had some experience that using only getReturnValue and make constraints on that does not work as intended, and the reason can be that we need to bind a value for the call expr otherwise it is an unknown (undefined?) value (and not the conjured symbol)?

Szelethus added inline comments.Mar 2 2020, 10:13 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
71–89

I personally wouldn't worry about memory consumption in this case too much, considering how much information needs to be store for simple expressions, stream objects are relatively few and far in between, even on projects that use them a lot.

Having one more enum in StreamState would be better in this case then! :)

93–126

I suspect that getReturnValue might only work in postCall, but I'm not sure.

I think instead of this class, a function returning a std::tuple would be nicer, with std::tie on the call site. You seem to use all 3 returns values in the functions that instantiate MakeRetVal anyways :).

In StdLibraryFunctionsChecker's evalCall, the return value is explicitly constructed, and further constraints on it are only imposed in postCall. I wonder why that is. @martong, any idea why we don't apply the constraints for pure functions in evalCall?

346

According to the C'98 standard §7.19.9.2.5:

After determining the new position, a successful call to the fseek function undoes any effects of the ungetc function on the stream, clears the end-of-file indicator for the stream, and then establishes the new position. After a successful fseek call, the next operation on an update stream may be either input or output.

So it definitely doesn't clear the EOF flag on failure.

balazske marked 2 inline comments as done.Mar 3 2020, 12:49 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
93–126

The return value case is not as simple because the DefinedSVal has no default constructor, but it is not as bad to return only the RetVal and have a CE argument.

346

Yes it does say nothing about what happens with EOF flag on failure, so it should be is better to not change it. And we do not know if it is possible to get an EOF error (seek to after the end of file?).

balazske updated this revision to Diff 247814.Mar 3 2020, 12:51 AM

Removed MakeRetVal, fixed a bug in evalFseek.

evalFseek is to be updated further.

balazske updated this revision to Diff 247824.Mar 3 2020, 1:47 AM
balazske marked 2 inline comments as done.

Updated StreamState to include the error state.

balazske marked 2 inline comments as done.Mar 3 2020, 1:50 AM
Szelethus added inline comments.Mar 3 2020, 2:40 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
32–54

Could you please move these to the individual enums please? :) I have an indexer that can query those as documentation.

38–41

These too. Also, I'm not yet sure whether we need OtherError and AnyError, as stated in a later inline.

93–126

I like the current solution very much!

332–334

If we check in preCall whether the stream is opened why don't we conservatively assume it to be open?

350–353

But why? The standard suggests that the error state isn't changed upon failure. I think we should leave it as-is.

437–438

Right here, should we just assume a stream to be opened when we can't prove otherwise? ensureStreamOpened is only called when we are about to evaluate a function that assumes the stream to be opened anyways, so I don't expect many false positive lack of fclose reports.

balazske marked 4 inline comments as done.Mar 3 2020, 5:43 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
332–334

If we do that we get a resource leak error for example in the test function pr8081 (there is only a call to fseek). We can assume that if the function gets the file pointer as argument it does not "own" it so the close is not needed. Otherwise the false positive resource leaks come always in functions that take a file argument and use file operations. Or this case (the file pointer is passed as input argument, the file is not opened by the function that is analyzed) can be handled specially, we can assume that there is no error initially and closing the file is not needed.

350–353

The fseek can fail and set the error flag, see the example code here:
https://en.cppreference.com/w/c/io/fseek
Probably it can not set the EOF flag, according to that page it is possible to seek after the end of the file. So the function can succeed or fail with OtherError.

437–438

The warning is created only if we know that the stream is not opened. This function makes no difference if the stream is "tracked" (found in StreamMap) or not. In the not-tracked case it is the same as if it were opened. Probably the function can be changed to take a StreamState instead of StreamVal and the not-tracked case can be handled separately. Or this function can add the new StreamState in (opened state).

balazske updated this revision to Diff 248155.Mar 4 2020, 5:40 AM
  • Moved enum comments.
  • Updated fseek behaviour.
balazske marked 5 inline comments as done.Mar 4 2020, 5:48 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
38–41

I plan to use AnyError for failing functions that can have EOF and other error as result. At a later ferror or feof call a new state split follows to differentiate the error (instead of having to add 3 new states after the function, for success, EOF error and other error). If other operation is called we know anyway that some error happened.

332–334

This can be done in a next change. It involves more changes at other places. I think of inserting the state for the stream if it was not there before. But we need to save if this was such an insert or a normal fopen (and do not report resource leak for the "insert" case).

Okay, I think we're mostly in agreement so far -- could we implement a warning and add some test files for unchecked stream states after a failed fseek call?

The title of the revision is "[Analyzer][StreamChecker] Introduction of stream error state handling.", yet it is mostly about fseek, even if the intention is to demonstrate how error state handling would look like through its better modeling. How about we change the title to "[Analyzer][StreamChecker] Model fseek better by introducing stream error states"?

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
38–41

I think it would still be better to introduce them as we find uses for them :) Also, to stay in the scope of this patch, shouldn't we only introduce FseekError? If we did, we could make warning messages a lot clearer.

332–334

This can be done in a next change.

Consider me convinced :)

350–353

Yup, right, you won :) I tried some examples out on my system, and it could preserve or change the error state of the stream. To me it seems like that not checking the state of the stream after a failed fseek is surely unadvised.

This still should be AnyError (or FseekError), as according to the OtherError's description OtherError may not refer to EOF, yet after a failed fseek call the stream can be in EOF:

$ cat test.cpp
#include <cstdio>

int main() {
  FILE *F = fopen("test.cpp", "r");

  while (EOF != fgetc(F)) {}

  if (feof(F))
    printf("The file is closed!\n");

  // Return value is non-zero on failure.
  if (fseek(F, -100000, SEEK_END)) {
    if (feof(F))
      printf("The file is still closed!\n");
    else
      printf("The file is no longer closed!\n");
  }
}
$ clang test.cpp && ./a.out
The file is closed!
The file is still closed!
balazske marked 8 inline comments as done.Mar 5 2020, 12:27 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
38–41

This change is generally about introduction of the error handling, not specifically about fseek. Probably not fseek was the best choice, it could be any other function. Probably I can add another, or multiple ones, but then again the too-big-patch problem comes up. (If now the generic error states exist the diffs for adding more stream operations could be more simple by only adding new functions and not changing the StreamState at all.) (How is this related to warning messages?)

350–353

After some experimenting I think it is best to have every possibility of errors after fseek. This means, either EOF, or other-error, or non-EOF and not other-error (but failed fseek call according to return value). So we need 1 success and 2 error return value states, one error return with AnyError and one with NoError (strange but happens according to the shown output). Probably there is relation between the previous state and the produced result of fseek but I do not want to figure out, it may be different in other systems and the documentations say nothing.

This comes from this program:

#include <stdio.h>

void print_result(FILE *F, int rc, const char *errtxt) {
  printf("--------\n%s", errtxt);
  if (rc) {
    printf("failed...\n");
    if (feof(F)) {
      printf("FEOF\n");
    }
    if (ferror(F)) {
      printf("FERROR\n");
      perror("error");
    }
  } else {
    printf("success\n");
  }
}

int main() {
  FILE *F = fopen("fseek.c", "r");

  while (EOF != fgetc(F)) {}
  print_result(F, 1, "read done\n");

  // Return value is non-zero on failure.
  int rc = fseek(F, -100000, SEEK_END);
  print_result(F, rc, "seek invalid\n");

  rc = fseek(F, 2, SEEK_END);
  print_result(F, rc, "seek valid\n");

  rc = fseek(F, -100000, SEEK_END);
  print_result(F, rc, "seek invalid\n");
  
  fputs("str", F);
  print_result(F, 1, "failed operation\n");

  rc = fseek(F, -100000, SEEK_END);
  print_result(F, rc, "seek invalid\n");

  rc = fseek(F, -1, SEEK_END);
  print_result(F, rc, "seek valid\n");

  rc = fseek(F, -100000, SEEK_END);
  print_result(F, rc, "seek invalid\n");
}

And the result is:

--------
read done
failed...
FEOF
--------
seek invalid
failed...
FEOF
--------
seek valid
success
--------
seek invalid
failed...
--------
failed operation
failed...
FERROR
error: Bad file descriptor
--------
seek invalid
failed...
FERROR
error: Invalid argument
--------
seek valid
success
--------
seek invalid
failed...
FERROR
error: Invalid argument
balazske marked 2 inline comments as done.Mar 5 2020, 12:32 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
38–41

For now, the EofError and OtherError can be removed, in this change these are not used (according to how fseek will be done).

Szelethus added inline comments.Mar 5 2020, 3:48 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
38–41

This change is generally about introduction of the error handling, not specifically about fseek. Probably not fseek was the best choice, it could be any other function.

You could not have picked a better function! Since the rules around the error state of the stream after a failed fseek call are quite complex, few functions deserve their own error state more.

(How is this related to warning messages?)

I had the following image in my head, it could be used at the bug report emission site to give a precise description:

if (SS->isInErrorState()) {
  switch(SS.getErrorKind) {
  case FseekError:
    reportBug("After a failed fseek call, the state of the stream may "
              "have changed, and it might be feof() or ferror()!");
    break;
  case EofError:
    reportBug("The stream is in an feof() state!");
    break;
  case Errorf:
    reportBug("The stream is in an ferror() state!");
    break;
  case OtherError:
    // We don't know what the precise error is, but we surely
    // know its in one.
    reportBug("The stream is in an error state!");
    break;
  }

(If now the generic error states exist the diffs for adding more stream operations could be more simple by only adding new functions and not changing the StreamState at all.)

For the last case in the code snippet (OtherError), I'm not too sure what the conditions are -- when do we know what the stream state is (some sort of an error), but not know precisely how? In the case of fseek, we don't precisely know what the state is but we know how it came about. I just don't yet see why we need a generic error state.

Probably I can add another, or multiple ones, but then again the too-big-patch problem comes up.

I think the point of the patch is to demonstrate the implementation of an error state, not to implement them all, and it does it quite well!

For now, the EofError and OtherError can be removed, in this change these are not used (according to how fseek will be done).

I agree!

To avoid problems I created a new version of this diff too that follows after the other new ones:
https://reviews.llvm.org/D75682
(Adding a new diff can make inline comment positions even more inexact specially if the diff has many differences from an older one?)

martong added inline comments.Mar 6 2020, 6:37 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
93–126

In StdLibraryFunctionsChecker's evalCall, the return value is explicitly constructed, and further constraints on it are only imposed in postCall. I wonder why that is. @martong, any idea why we don't apply the constraints for pure functions in evalCall?

We could apply them in evalCall technically.
I think the reason why we don't do that is the matter of implementation, and more importantly this way we are consequent with the traditional Hoare logic: {Pre}C{Post} as {Post} is done in postCall.

To avoid problems I created a new version of this diff too that follows after the other new ones:
https://reviews.llvm.org/D75682
(Adding a new diff can make inline comment positions even more inexact specially if the diff has many differences from an older one?)

I'd strongly prefer to finish this and move on after that -- we have the discussion here, and a great looking patch with only a few things to address.

Could you please fix up the dependencies of this revision?

I have "mirrored" all 3 changes in this stack to the new series in D75682. Probably it is possible to reuse these revisions instead but I do not know if it will not confuse phabricator somehow (and how phabricator behaves in such "tricky" cases, there is not a usable documentation for it). The D75682 is the one that should be used now, it is the same change as this one plus the ferror and feof functions and tests. The new part with ferror and feof can be done in a new change but these belong logically into this change to make the error handling complete and testable (this change in current form is not good for tests).

This comment was removed by Szelethus.
Szelethus added a comment.EditedMar 6 2020, 7:55 AM

The D75682 is the one that should be used now,

If this patch is supposed to be a followup to D75682, could you please mark it as such? I find these revisions difficult to navigate.

I have "mirrored" all 3 changes in this stack to the new series in D75682. Probably it is possible to reuse these revisions instead but I do not know if it will not confuse phabricator somehow (and how phabricator behaves in such "tricky" cases, there is not a usable documentation for it).

Since this is the patch where we held the discussion about error states, I think it would be better for this revision land first, that would also solve the problem of inlines being all over the place. It doesn't really matter whether we're introducing error states first through feof and ferror, or the admittedly quirky fseek. WDYT?

balazske abandoned this revision.Mar 9 2020, 8:00 AM