This is an archive of the discontinued LLVM Phabricator instance.

[clang][analyzer] Use generic note tag in alpha.unix.Stream .
Needs ReviewPublic

Authored by balazske on Jul 19 2021, 3:57 AM.

Details

Summary

The multiple note tag functions are replaced with one that
works for all cases.

Diff Detail

Event Timeline

balazske created this revision.Jul 19 2021, 3:57 AM
balazske requested review of this revision.Jul 19 2021, 3:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2021, 3:57 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
Szelethus added inline comments.Jul 20 2021, 2:54 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
427–438

I have ambivalent feelings on this. I see what you are shooting for: display a specific NoteTag only for a specific BugType, though I wonder whether whether some of these notes would be nice for more than one. The only test case that changed seems to support my theory, or at least I like it better.

clang/test/Analysis/stream-note.c
36–41

I think I preferred this, honestly.

balazske added inline comments.Jul 20 2021, 6:08 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
427–438

I will update the patch so that it supports multiple bug types in one note and bug type specific message. This will be done together with the adding of the new notes so there will be test for the new functionality.

balazske updated this revision to Diff 360124.Jul 20 2021, 7:32 AM

Support multiple bug types in the note tag function.
Add the note tag to every place and update tests.

Szelethus added inline comments.Jul 23 2021, 8:42 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
236–242

Well this looks odd: for both BT_StreamEof AND BT_IndeterminatePosition we want to display a NoteTag for a failed fseek, for one you print "Assuming stream reaches end-of-file here", for the other, "Assuming this stream operation fails". This seems to contradict your comment in the fseek evaluating function:

If fseek failed, assume that the file position becomes indeterminate in any case.

Also, these BugTypes should be responsible for the *error message*, not the NoteTag message. I'd prefer if we mapped an enum to these strings (NoteTagMsgKind?), pass that as well to constructNoteTag, and allow the caller to decide that for which BugTypes the NoteTag is worth displaying for.

I think such a 4-argument constructNoteTag would capture best what we want here.

425–432

How about:

Create a NoteTag describing an stream operation (whether stream opening succeeds or fails, stream reaches EOF, etc).
As not all operations are interesting for all types of stream bugs (the stream being at an indeterminate file position is irrelevant to whether it leaks or not), callers can specify in BT for which BugTypes should this note be displayed for.
Only the NoteTag closest to the error location will be added to the bug report.

clang/test/Analysis/stream-note.c
36–41

Hmmm... I've given this some thought, and yes, I the stream misuse can indeed be captured starting from the last freopen call. The specialized message for reopen was nice, but I guess no actual information was lost by this patch.

51

I'd prefer an individual line for these expected-.* directives. Its down to personal preference, but I find that far easier to read.

balazske added inline comments.Jul 26 2021, 1:29 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
236–242

It is still unclear how to model the fseek (and other functions). (But this is not a problem for this patch.) We can do it according the POSIX or C standard, or just by the experience that a fseek may fail with EOF or ferror or none of them. The standards are not exact but do not mention that fseek should cause the indeterminate flag to be set at all, or that fseek can cause feof state.

236–242

The message for a NoteTag depends on the bug type and at this state the bug type is sufficient to determine the note message. Because it is possible to add multiple bug types to a NoteTag, passing a custom message to it can be done only by passing a BugType->Message map to the note tag. This may be unnecessary complexity for the current use case.

425–432

The NoteTag is added at a place where a possible future bug is introduced. The bug type indicates which bug is the one that can happen after this event. If this bug is really detected the last NoteTag for this type (ignore other NoteTags with non-matching bug type) contains the relevant information.

Szelethus requested changes to this revision.Jul 29 2021, 3:47 AM

I like the generalization still, but I don't agree with how you retrieve the NoteTag message. Its the wrong way around. This is how you invoke your function:

void StreamChecker::evalFclose(/* ... */, CheckerContext &C) const {
  ProgramStateRef State = C.getState();
  SymbolRef Sym = /* get stream object */;
  const StreamState *SS = State->get<StreamMap>(Sym);

  // early returns, asserts, etc...

  // Close the File Descriptor.
  // Regardless if the close fails or not, stream becomes "closed"
  // and can not be used any more.
  State = State->set<StreamMap>(Sym, StreamState::getClosed(Desc));

  C.addTransition(State, constructNoteTag(C, Sym, {&BT_UseAfterClose})); // <--- (#)
}

What are you telling in (#)? Its confusing, no bug occurred here, what is a BugType doing here? Are you guessing that if a bug will occur, the bug will have that specific BugType? You need to go out of your way to find the mapping to BugMessages, read a bunch of documentation and comments to realize what the intent was. Even if its algorithmically correct, its a bit upside down.

If you passed the note message:

C.addTransition(State, constructNoteTag(C, Sym, "Stream closed here"));

// or from some NoteMessageKind or whatever enum:

C.addTransition(State, constructNoteTag(C, Sym, NMK_StreamClosedHere));

you could move the logic of displaying this note only for specific kinds of BugTypes, or a single BugType into constructNoteTag. That would be a big improvement already. The call site will be very telling of whats happening: you are constructing a NoteTag with the string that is passed as an argument. I realize this feels like over the top nitpicking, but easily readable code is very valuable, not to mention this is the experience I had when I read this patch for the first time.

Of course, the person who will have to debug this checker later on as to why the NoteTag isn't displaying for a different BugType to that will again have to go through that function and the comments to understand that these messages are only displayed for specific BugTypes, this is why my four-argument suggestion would be, in my world, the ideal solution. I don't think that adds any complexity, but would make the code a lot more readable.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
236–242

It is still unclear how to model the fseek (and other functions). (But this is not a problem for this patch.)

I agree, we can discuss that another time.

The message for a NoteTag depends on the bug type and at this state the bug type is sufficient to determine the note message.

Aha, so the claim is that a BugType can unambiguously determine the NoteTag message. I get it: among all the NoteTags this checker is capable of emitting, you only want to display the one that directly caused a specific kind of a bug. If a stream operation is done on a stream object that is already at EOF, the only noteworthy NoteTag could be the one where the stream reached the end of file. If the stream leaked, we only need to mention where the stream was opened.

You (even if implicitly) claim that, for example, where the stream is opened wouldn't ever be interesting to any any programming error, but stream leaking.

What I meant in my previous comments is that this is not necessarily true, or will not stay true for long. But, lets stick with your idea now, and expand later if needed.

Because it is possible to add multiple bug types to a NoteTag, passing a custom message to it can be done only by passing a BugType->Message map to the note tag. This may be unnecessary complexity for the current use case.

BugType has no Message field. Even if it did, it should *only* describe the error node. Its a single point in the analysis, it should not, and does not know anything about what happened in past. That's the job of bug report generation facilities to figure out.

This revision now requires changes to proceed.Jul 29 2021, 3:47 AM

The bug type is passed to constructNoteTag only to identify what message will be displayed. The bug types that are related to the current function (a message should be here if the bug is happening) are passed in. It should be enough to look at comment before constructNoteTag to check this (and to the bug type -> message map).

C.addTransition(StateFailed, constructNoteTag(C, StreamSym,
                                              {&BT_StreamEof,
                                               &BT_IndeterminatePosition}));

This indicates that if later an EOF error happens (function called in EOF state) this is the place to display a message "stream becomes EOF here". And if later an error "stream position indeterminate" happens the note should be here too, but with other message. So we can not have a single string as last parameter instead of the bug types. It is possible to add multiple note tags but I do not like it because this makes more ExplodedNodes. Or it is possible to figure out the needed message in the NoteTag function from the program state but this may be relatively difficult task and more code than the current solution.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
425–432

This is the planned comment at constructNoteTag:

/// Create a NoteTag to display a note if a later bug report is generated.
/// This should be added added at a place where a possible future bug is
/// introduced. The bug type indicates which bug is the one that can happen
/// after this event. If this bug is really detected the last NoteTag for
/// its type (ignore other NoteTags with non-matching bug type) contains the
/// relevant information (location and text message).

The bug type is passed to constructNoteTag only to identify what message will be displayed. The bug types that are related to the current function (a message should be here if the bug is happening) are passed in. It should be enough to look at comment before constructNoteTag to check this (and to the bug type -> message map).

C.addTransition(StateFailed, constructNoteTag(C, StreamSym,
                                              {&BT_StreamEof,
                                               &BT_IndeterminatePosition}));

This indicates that if later an EOF error happens (function called in EOF state) this is the place to display a message "stream becomes EOF here". And if later an error "stream position indeterminate" happens the note should be here too, but with other message. So we can not have a single string as last parameter instead of the bug types.

Right, okay, I see that I looked over something important.

I forgot our optimization trickery. It is reminiscent of a Schrödinger's cat: after a failed stream operation, the stream object could simultaneously be in either of these states:

  • Reached EOF,
  • Have an indeterminate file position,
  • Have ferror set.
TL;DR:

No matter what happens during analysis, as of now, the NoteTag message is unambiguous at the point of the tags construction. Do we have plans to change our current modeling around indeterminate file positions, or add a warning for FERROR that would necessitate the proposed complexity?

Onto the rest:

Its at the next stream operation where its decided what state it was actually put in. So, if a stream operation like fseek fails, it may leave the stream object in any of the above states. Later, when we analyze fread, we check in preFread in order whether the stream is null, is opened, its file position indicator is determinate, and whether the stream has EOF set. Whichever error is detected first will be the one to get emitted, and in this case, its going to be an indeterminate file position.

void f(FILE *f) {
  fseek(f, offset, origin); // note: stream operation failed
  fread(ptr, size, count, f); // warn: indeterminate file position
}

Now, lets check explicitly whether fseek left the stream in FERROR, and return early if so. Lets discuss how fseek should work another time, but for the sake of this example, *suppose* that it only leaves the file position indicator indeterminate if it also sets FERROR, and ferror respects this supposed behaviour. In that case, only the EOF flag remains.

void f(FILE *f) {
  fseek(f, offset, origin); // note: assuming stream reached eof
  if (ferror(f)) // note: assuming the condition is false
    return;
  fread(ptr, size, count, f); // warn: stream already in eof
}

The bug type changed, and it would be handy if the note tag changed as well. It really is the inspection of the bug type (hence the Schrödinger-like behaviour) that decides what error was actually inflicted upon the stream object at fseek.


Now, its worth noting that in practice, NoteTags never change their message. Lets take a survey:

BugType BT_FileNull{this, "NULL stream pointer", "Stream handling error"};
BugType BT_UseAfterOpenFailed{this, "Invalid stream",
                              "Stream handling error"};

The only meaningful NoteTag is for both of these is added around a failed (re)open, and the emitted string shouldn't depend on which one it is. Stream objects are never in a "schrödinger-like" state in terms of these bugs -- a stream is either NULL, or NULL as a result of a failed open.

BugType BT_IllegalWhence{this, "Illegal whence argument",
                         "Stream handling error"};

This is practically a statement-local issue, and even if the whence argument was calculated elsewhere, its hardly the job of this checker to track down.

BugType BT_UseAfterClose{this, "Closed stream", "Stream handling error"};

The only noteworthy event here is where was the stream was closed. Stream closure cannot fail, the only state a stream can remain in after a call to close is it being closed, so the note message is unambiguous.

BugType BT_ResourceLeak{this, "Resource leak", "Stream handling error",
                        /*SuppressOnSink =*/true};

The only noteworthy event is stream opening. Now, opening can fail, but we split the state immediately there, and we can either write "Opened stream here", or "Opening stream failed here". On each of these path, this message is unambiguous.

BugType BT_IndeterminatePosition{this, "Invalid stream state",
                                 "Stream handling error"};
BugType BT_StreamEof{this, "Stream already in EOF", "Stream handling error"};

Alright, so this is the juicy one. As demonstrated above, a number of stream operations can result in 3 types of erroneous stream states. However, even here, the note tag message in unambiguous, mostly because of two reasons:

  • We don't emit warnings for FERROR (as you've said in D80015#2043263),
  • As of now, you cannot get rid of indeterminate file position indicator. clearerr() (rightfully) leaves it as-is, branching on ferror leaves it on unconditionally.

So, because we check indeterminate file position is a more serious error then an EOF error (or FERROR, if we were to emit a warning for it), if a stream operation *could* result in an indetermiante file position, the note tag will *always* be that the stream was left in it. In any other case, the note message will be that the operation left the stream object at EOF.

In this table, you can see the possible flags the stream object can hold at the NoteTag construction point, what the note tag message *will* be, and what the resulting bug type *will* be.

EOFFERRORIndet. posNote message at the stream operationPossible bug types
truetruetruestream object's file position indicator is left indeterminateBT_IndeterminatePosition
truefalsetruestream object's file position indicator is left indeterminateBT_IndeterminatePosition
falsetruetruestream object's file position indicator is left indeterminateBT_IndeterminatePosition
falsefalsetruestream object's file position indicator is left indeterminateBT_IndeterminatePosition
truefalsefalsestream object reached EOF hereBT_StreamEof
truetruefalsestream object reached EOF hereBT_StreamEof
falsetruefalsenothingnothing

Note that the only way to get of the indeterminate file position is either to reopen the stream, but if that succeeds, all stream error flags are reset, or call fseek or something similar again, but then that would be the latest event, and the NoteTag would be placed there.


Lets conclude with a few questions:

  • Do we ever intend to create a warning for FERROR? If so, can we add it before this patch to test the change on the NoteTag message?
  • Will there ever be a way to get rid of the indeterminate position indicator tag (possibly by getting rid of the FERROR flag at the same time)? If so, can we add a void StreamTesterChecker_remove_indeterminate_file_pos_tag(FILE *) function and test the change on the NoteTag message?

If the answer to both of those is no, then we don't need this complexity ;)

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
532

Maybe we could add BT_UseAfterOpenFailed?

No matter what happens during analysis, as of now, the NoteTag message is unambiguous at the point of the tags construction.

The stream error state is designed to store a "superposition" of different error states. This is part of evalFseek:

StateFailed = StateFailed->set<StreamMap>(
    StreamSym,
    StreamState::getOpened(Desc, ErrorNone | ErrorFEof | ErrorFError, true));
C.addTransition(StateFailed,
                constructNoteTag(C, StreamSym,
                                 {&BT_StreamEof, &BT_IndeterminatePosition}));

The code above means the same as this code:

StateFailedFEof = StateFailed->set<StreamMap>(
    StreamSym,
    StreamState::getOpened(Desc, ErrorFEof, false));
C.addTransition(StateFailedFEof, constructNoteTag(C, StreamSym, {&BT_StreamEof}));

StateFailedFErrorIndeterminate = StateFailed->set<StreamMap>(
    StreamSym,
    StreamState::getOpened(Desc, ErrorFError, true));
C.addTransition(StateFailedFErrorIndeterminate, constructNoteTag(C, StreamSym, {&BT_IndeterminatePosition}));

StateFailedIndeterminate = StateFailed->set<StreamMap>(
    StreamSym,
    StreamState::getOpened(Desc, ErrorNone, true));
C.addTransition(StateFailedIndeterminate, constructNoteTag(C, StreamSym, {&BT_IndeterminatePosition}));

In this code the NoteTag message is unambiguous. The code above is a "compression" of this and at the construction of the NoteTag message is in "superposition" state too (like the error state that determines the message).

freopen is a different case: At failure it returns a NULL pointer and the stream becomes in an invalid state. But if the return value of freopen is not assigned to it, the stream pointer is not set to NULL:

  • F = freopen(0, "w", F); F will be NULL at failure, this makes a bug with BT_FileNull bug type.
  • freopen(0, "w", F); F becomes invalid but not NULL, if used then a BT_UseAfterOpenFailed bug is detected (this is the only case for this bug type).

At failure of freopen we can not tell which case will occur later (it depends on the analyzed code). (It may be possible to have only one bug for these cases but a differentation is needed to tell the user if a file was NULL or invalid but not NULL.)

Do we ever intend to create a warning for FERROR?

Probably not (but there may be some not yet modeled stream operation where this may be useful.). This is when FERROR is set but not the indeterminate position. It indicates that the last operation failed but it is OK to use the stream. But this can be a "ErrorReturn" checker problem (a failed operation was not observed by the program).

Will there ever be a way to get rid of the indeterminate position indicator tag?

Currently the fseek can reset this flag, and the freopen is another case for it.

How about we just change the NoteTag to this: "Failed stream operation could have left the error or eof flags set, or the file position indicator indeterminate"? We could add a NoteTags to feof and ferror that narrows this down, for example:

fread(F); // note: Failed stream operation could have left the error or eof flags set
// ...
if (ferror(F)) // note: Assuming F does not have the error flag set
  return;
fread(F); // warning: Read function called when stream is in EOF state. Function has no effect.
fread(F); // note: Failed stream operation could have left the error or eof flags set
// ...
if (feof(F)) // note: Assuming F does not have the eof flag set
  return;
fread(F); // warning: Read function called when stream has the error flag set. Maybe call clearerr()?
fseek(F); // note: Failed stream operation could have left the error or eof flags set, or the file position indicator indeterminate
// ...
if (feof(F)) // note: Assuming F does not have the eof flag set
  return;
//...
fread(F); // warning: File position of the stream might be 'indeterminate' after a failed operation. Can cause undefined behavior.

Seems like solid future proofing is you change your mind on emitting warnings on FERROR or we realized if some operation could ensure that the file position indicator is determinate.

Do we ever intend to create a warning for FERROR?

Probably not (but there may be some not yet modeled stream operation where this may be useful.). This is when FERROR is set but not the indeterminate position. It indicates that the last operation failed but it is OK to use the stream. But this can be a "ErrorReturn" checker problem (a failed operation was not observed by the program).

I guess its a bad practice to not check whether the stream has FERROR, and I guess using another stream operation is as good of a point to check as any, so I'd definitely do it here than in ErrorReturn. But for now, alright, lets leave this for another day.

Will there ever be a way to get rid of the indeterminate position indicator tag?

Currently the fseek can reset this flag, and the freopen is another case for it.

Well, sure, but they would also become the last stream operation. These Schrödinger cases are a problem because failed stream operation #1 could leave the stream object in a number of error states (e.g. fseek), and event #2 (e.g. checking whether the return value from fgetc equals to EOF) and stream operation #3 (e.g. using StreamTesterChecker_remove_indeterminate_file_pos_tag()) narrowed this down to one specific error kind (e.g. its in FERROR). Basically, events #2 and #3 determine what happened at #1, and since #1 is the error causing event, we need to highlight it somehow. If #2 is an fseek, freopen, or other call that resets the entire stream object, then it will become the last event the highlight, making #1 no longer interesting at all.

No matter what happens during analysis, as of now, the NoteTag message is unambiguous at the point of the tags construction.

[...]
In this code the NoteTag message is unambiguous. The code above is a "compression" of this and at the construction of the NoteTag message is in "superposition" state too (like the error state that determines the message).

Can you give me a few test cases on this patch where the NoteTag indeed changes on a stream operation? My long comment meant to demonstrate that there doesn't exist such, but I'm happy to be proven wrong. My questions are about whether there *could* exist such a case in the future.

freopen is a different case: At failure it returns a NULL pointer and the stream becomes in an invalid state. But if the return value of freopen is not assigned to it, the stream pointer is not set to NULL:

  • F = freopen(0, "w", F); F will be NULL at failure, this makes a bug with BT_FileNull bug type.
  • freopen(0, "w", F); F becomes invalid but not NULL, if used then a BT_UseAfterOpenFailed bug is detected (this is the only case for this bug type).

At failure of freopen we can not tell which case will occur later (it depends on the analyzed code). (It may be possible to have only one bug for these cases but a differentation is needed to tell the user if a file was NULL or invalid but not NULL.)

Sure, but the NoteTag assigned to freopen still wouldn't change (it just wouldn't show for the former case, as writing to F would clear the interestingness on it).

balazske updated this revision to Diff 364730.Aug 6 2021, 2:07 AM

Using "joined" note tag messages to have bugtype independent note tag functions.
New note tags at ferror and feof.

Alright, I think we're almost ready to go! I left a few comments, please make sure to mark those done that you feel are properly addressed.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
418–419

The main thing to highlight here is that its not only the last failing operation, but more importantly that this operation caused the bug to occur.

425–432

Aha, okay, so you need a NoteTag that removes interesstingness in the case where we found the stream operation that caused the bug report, and one that does not remove interesstingness in the case where a stream operation is worth explaining, but is not the cause. Fair enough!

In the function name, you use the word "failure", but state that its not always a failure that the NoteTag describes. How about constructLatestNoteTag, and constructNoteTag? I think that explains what happens in the function (and its comments) better.

428–429

Same here, mention that its the failed stream operation that caused the bug is what we're specifying further.

533
589
708

Lets leave a TODO here, before we forget it:
C'99, pdf page 313, §7.19.8.1.2, Description of fread:

If a partial element is read, its value is indeterminate.

736–739

We can be more specific here. While the standard doesn't explicitly specify that a read failure could result in ferror being set, it does state that the file position indicator will be indeterminate:

C'99, pdf page 313, §7.19.8.1.2, Description of fread:

If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.

C'99, pdf page 313, §7.19.8.2.2, Description of fwrite:

If an error occurs, the resulting value of the file position indicator for the stream is indeterminate.

Since this is the event to highlight, I'd like to see it mentioned. How about:

Stream either reaches end-of-file, or fails and has its file position indicator left indeterminate, or the error flag set.
After this operation fails, the stream either has its file position indicator left indeterminate, or the error flag set.

Same for any other case where indeterminate file positions could occur.

796

Like here.

935–937

Please leave a TODO here, don't fix now.

971

Stating that it happened as a result of a failed operation seems kind of redundant, especially if the NoteTag states that as well. Lets leave a TODO here to address this warning message, but leave as-is for now.

1160

Lets put one there!

clang/test/Analysis/stream-note.c
36–41

You can mark these done.

51

I'd like to see this addressed. Lets have a new line for each directive, at least where the 80 column limit is reached.

balazske marked 2 inline comments as done.Aug 9 2021, 11:53 PM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
708

This means that the (content of the) buffer passed to fread should become in a "uninitialized" (undefined) state?

736–739

For the fread and fwrite cases, I think that the error flag and the indeterminate position is always set if error occurs. It looks more natural to tell the user that "the operation fails" than "file position becomes indeterminate". And the user could see that the operation fails and file position is "indeterminate" from the error reports, the failure causes the indeterminate (or "undefined"?) position.

Only the fseek is where indeterminate position can appear without setting the ferror flag (but the failure is discoverable by checking the return value of fseek). Still the cases "operation fails" (set ferror flag and/or leave file position indeterminate, return nonzero) and "stream reaches end-of-file" are the ones that are possible. The checker documentation can contain more exactly why the checker works this way.

Szelethus added inline comments.Aug 11 2021, 5:23 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
708

On the return value:

The fread function returns the number of elements successfully read, which may be less than nmemb if a read error or end-of-file is encountered.

So I guess only the (return value + 1)th element of the array is indeterminate.

736–739

Well, to me, seeing both the error flag and the file position indicator being mentioned here sounds nice, since we are already in the possession of that information. How about

Stream either reaches end-of-file, or fails and has its file position indicator left indeterminate and the error flag set.
After this operation fails, the stream either has its file position indicator left indeterminate and the error flag set.

?

The checker documentation can contain more exactly why the checker works this way.

I think adding this bit about the file position indicator shouldn't be in the docs only, though explaining the schrödinger-like behaviour in there would be nice.

I think that the error flag and the indeterminate position is always set if error occurs.

This means that we need to make our ferror in the future smarter. Can you leave a TODO about that ferror needs to check what was the last stream operation that may have failed? In the case where it was an fread/fwrite, on its false branch, we need to clear both ferror and the file position indocator.

balazske added a comment.EditedAug 11 2021, 8:32 AM

Really I still not understand why the previous BugType dependent NoteTag functions were bad design (except that it could make the code difficult to understand). If we would have the BugType available in the NoteTag, we could make the decision about what to display and no "or"s are needed in the message. We do not need a "Stream either reaches end-of-file, or fails and has its file position indicator left indeterminate and the error flag set." message if the information is available about what the exact problem was (from the BugType) and we can build a "Stream reaches end-of-file." if the bug was end-of-file related, and so on. (Really instead of the bug type other information could be used that is passed from the bug report to the note tag, but there is no way for this to do?) Otherwise just the user has to find out the same thing by looking at later functions and notes. So I want to wait for the opinion of another reviewer(s) before proceeding.

Really I still not understand why the previous BugType dependent NoteTag functions were bad design (except that it could make the code difficult to understand). If we would have the BugType available in the NoteTag, we could make the decision about what to display and no "or"s are needed in the message. We do not need a "Stream either reaches end-of-file, or fails and has its file position indicator left indeterminate and the error flag set." message if the information is available about what the exact problem was (from the BugType) and we can build a "Stream reaches end-of-file." if the bug was end-of-file related, and so on. (Really instead of the bug type other information could be used that is passed from the bug report to the note tag, but there is no way for this to do?)

My strongest arguments for the the notes I suggested is that this would give the complete picture. Suppose I see the note "stream has its error flag set", and fix the FERROR case (add an ferror() check with an early return), only to stumble across the very same bug an hour later, but now with an EOF in the note message. I could have added an early return about EOF as well, but I didn't know that function was modeled with EOF set as well. While my suggested note is long, I think this is about the shortest that would give users the complete picture, and I don't think that a longer note is something to avoid (CodeChecker users in particular have to deal with lot longer macro expansions already, and they are still useful IMO).

While I really believe that this would be the better option of the two, I don't strongly insist on it, especially if others see it a subpar option as well.

Otherwise just the user has to find out the same thing by looking at later functions and notes.

I believe the general advice for users is to read bug reports not from top to bottom, but from the error report up (as its likely that the essence of the bug can be summarized far before we get to the notes around where the analysis started). I think the long note version would be better in that case, it'd more clearly display how the analyzer reached the conclusion.

So I want to wait for the opinion of another reviewer(s) before proceeding.

Another set of eyes could never hurt, indeed!

We had a meeting outside phabricator, here is the gist of it:

  • It'd be better to have your initially proposed schrödinger-like NoteTags.
  • The intent is to emit a warning for each of the error states in StreamState. It seems to me that this is not what we're doing now, but, similarly to adding warnings fro FERROR and being able to clear the indeterminate file position indicator state, would truly display how NoteTags might change their message depending on the BugType.
  • There are numerous discussions to be had on how to handle fseek, how much do FERROR and the file position indicator imply one another, but for this patch, adding debug functions such as void StreamCheckerTest_remove_indeterminate_file_pos_tag(FILE *) or void StreamCheckerTest_warn_if_in_ferror(FILE *) would allow as to test the functionality added by this patch, and land it.

As such, I'd like if you implemented the above mentioned functions with their expected functionality. I'd like to test cases where the very same function call gets a different note message depending on the BugType:

void fseek_caused_feof() {
  FILE *F;
  char Buf[10];
  F = fopen("foo1.c", "r");
  if (!F) // expected-note {{Taking false branch}} expected-note {{'F' is non-null}}
    return;
  fseek(F, 1, SEEK_SET); // expected-note {{Stream reaches end-of-file here}} <============= (*)
  if (ferror(F)) // expected-note {{Assuming the error flag is not set on the stream}}
                 // expected-note@-1 {{Taking false branch}}
    return;
  StreamCheckerTest_remove_indeterminate_file_pos_tag(F);
  fread(Buf, 1, 1, F); // expected-warning{{Read function called when stream is in EOF state. Function has no effect}}
  // expected-note@-1{{Read function called when stream is in EOF state. Function has no effect}}
  fclose(F);
}

void fseek_caused_ferror() {
  FILE *F;
  char Buf[10];
  F = fopen("foo1.c", "r");
  if (!F) // expected-note {{Taking false branch}} expected-note {{'F' is non-null}}
    return;
  fseek(F, 1, SEEK_SET); // expected-note {{Stream has its error flag set}} <============= (*)
  if (feof(F)) // expected-note {{Assuming the end-of-file flag is not set on the stream}}
               // expected-note@-1 {{Taking false branch}}
    return;
  StreamCheckerTest_remove_indeterminate_file_pos_tag(F);
  StreamCheckerTest_warn_if_in_ferror(F); // expected-warning{{Read function called when stream is in FERROR state.}}
  // expected-note@-1{Read function called when stream is in FERROR state.}
  fclose(F);
}

void fseek_caused_indeterminate_file_pos() {
  FILE *F;
  char Buf[10];
  F = fopen("foo1.c", "r");
  if (!F) // expected-note {{Taking false branch}} expected-note {{'F' is non-null}}
    return;
  fseek(F, 1, SEEK_SET); // expected-note {{Stream operation leaves the file position indicator indeterminate}} <============= (*)
  clearerr(F);
  fread(Buf, 1, 1, F); // expected-warning{{File position of the stream might be 'indeterminate' after a failed operation. Can cause undefined behavior}}
  // expected-note@-1{{File position of the stream might be 'indeterminate' after a failed operation. Can cause undefined behavior}}
  fclose(F);
}

Maybe some others in the spirit of these 3.

balazske added a comment.EditedAug 31 2021, 9:21 AM

Representation of the stream error state in the checker:
The error state is stored in 4 separate flags: NoError, FEof, FError, Indeterminate.
This is to make the "Schrödinger" states possible. The first 3 flags are mutually exclusive for a single stream state. If more than one is set it is a combined state. This simulates multiple execution paths in which only one of the flags is set. For example if NoError and FEof is true it is 2 execution paths, one with no error and one with FEOF. Additionally there is the Indeterminate value. If this is true, the file position is indeterminate on all represented execution paths where applicable. It is applicable only if NoError or FError is true. If we have a state of (true, true, true, true), it means combination of 3 execution paths: (true, false, false, true) (none of error flags in the stream but position indeterminate), (false, true, false, false) (FEOF, indeterminate ignored here because it is not applicable), (false, false, true, true) (FERROR and indeterminate).

I like that! Though for now, any tests that displays how these notes can emit different messages for different BugTypes will suffice, so we can bypass other design discussions.

The following test is not good in place of the 3rd? It is nearly the same, only the condition (!ferror(F) && !feof(F)) is used instead of call to clearerr.

void check_indeterminate_notes_fseek_no_feof_no_ferror() {
  FILE *F;
  char Buf[10];
  F = fopen("foo1.c", "r");
  if (!F) // expected-note {{Taking false branch}} expected-note {{'F' is non-null}}
    return;
  fseek(F, 1, SEEK_SET);      // expected-note {{Assuming this stream operation fails}}
  if (!ferror(F) && !feof(F)) // expected-note {{Taking true branch}} // expected-note {{Left side of '&&' is true}}
    fread(Buf, 1, 1, F);      // expected-warning {{File position of the stream might be 'indeterminate' after a failed operation. Can cause undefined behavior}}
  // expected-note@-1 {{File position of the stream might be 'indeterminate' after a failed operation. Can cause undefined behavior}}
  fclose(F);
}

The condition can be changed to feof(F) or ferror(F) to change the bug type and the note message at fseek.

The following test is good too to test if the message depends on the bug type (note check comments are not included):

void check_notes_fseek() {
  FILE *F;
  char Buf[10];
  F = fopen("foo1.c", "r");
  if (!F) // expected-note {{Taking false branch}} expected-note {{'F' is non-null}}
    return;
  fseek(F, 1, SEEK_SET);
  fread(Buf, 1, 1, F); // expected-warning {{File position of the stream might be 'indeterminate' after a failed operation. Can cause undefined behavior [alpha.unix.Stream]}}
                       // expected-warning@-1 {{Read function called when stream is in EOF state. Function has no effect [alpha.unix.Stream]}}
  fclose(F);
}
balazske updated this revision to Diff 370192.Sep 2 2021, 1:59 AM

Using again bug type dependent note tag messages.
Some new tests are added.
The planned debug functions are not added yet.

I like everything I see here so far! As soon as those debug functions are in, the patch should land!

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
236–240

What this means might be obvious to us, because we are very familiar with this and similar checkers, but I'm sure its confusing for anybody else, even for seasoned analyzer developers. I'd prefer if comments like these were accompanied with examples. There are a few decent ones I think in some of the comments I left on this revision, feel free to copy one here.

408–410

How about you explain this logic thoroughly in one comment (maybe above BugMessages), and replace these last 3 lines with "See the comments for BugMessages."?

Probably it is better to make a big comment at the start of the file that explains how the checker works, like in FuchsiaHandleChecker. This comes in a separate patch.

balazske added a comment.EditedSep 17 2021, 2:29 AM

I'd like to test cases where the very same function call gets a different note message depending on the BugType:

These tests are doing this:

void check_indeterminate_notes_fseek_no_feof_no_ferror() {
  FILE *F;
  char Buf[10];
  F = fopen("foo1.c", "r");
  if (!F) // expected-note {{Taking false branch}} expected-note {{'F' is non-null}}
    return;
  fseek(F, 1, SEEK_SET);      // expected-note {{Assuming this stream operation fails and leaves the file position indeterminate}}
  if (!ferror(F) && !feof(F)) // expected-note {{The error flag is not set on the stream}} expected-note {{The end-of-file flag is not set on the stream}}
                              // expected-note@-1 {{Taking true branch}} expected-note@-1 {{Left side of '&&' is true}}
    fread(Buf, 1, 1, F);      // expected-warning{{File position of the stream might be 'indeterminate' after a failed operation. Can cause undefined behavior}}
  // expected-note@-1{{File position of the stream might be 'indeterminate' after a failed operation. Can cause undefined behavior}}
  fclose(F);
}

void check_feof_notes_fseek() {
  FILE *F;
  char Buf[10];
  F = fopen("foo1.c", "r");
  if (!F) // expected-note {{Taking false branch}} expected-note {{'F' is non-null}}
    return;
  fseek(F, 1, SEEK_SET); // expected-note {{Assuming stream reaches end-of-file here}}
  if (feof(F))           // expected-note{{The end-of-file flag is set on the stream}} expected-note {{Taking true branch}}
    fread(Buf, 1, 1, F); // expected-warning {{Read function called when stream is in EOF state. Function has no effect}}
  // expected-note@-1 {{Read function called when stream is in EOF state. Function has no effect}}
  fclose(F);
}

Why is the last test not sufficient and I should use instead this test?

void check_notes_fseek_caused_feof() {
  FILE *F;
  char Buf[10];
  F = fopen("foo1.c", "r");
  if (!F) // expected-note {{Taking false branch}} expected-note {{'F' is non-null}}
    return;
  fseek(F, 1, SEEK_SET); // expected-note {{Assuming stream reaches end-of-file here}}
  if (ferror(F)) { // expected-note {{The error flag is not set on the stream}}
                 // expected-note@-1 {{Taking false branch}}
    fclose(F);
    return;
  }
  StreamTesterChecker_clear_indeterminate_file_position(F);
  fread(Buf, 1, 1, F); // expected-warning{{Read function called when stream is in EOF state. Function has no effect}}
  // expected-note@-1{{Read function called when stream is in EOF state. Function has no effect}}
  fclose(F);
}

In check_feof_notes_fseek before the fread the EOF bit is on. In check_notes_fseek_caused_feof there are 2 possibilities, one is when fseek did not fail at all, other when it failed and then EOF is on. This are really 3 execution paths: fseek did not fail, fseek failed with EOF, and fseek failed without FEOF or FERROR but leaves indeterminate position (that is cleared, so we get the same state as after success).