Page MenuHomePhabricator

[Analyzer][StreamChecker] Introduction of stream error handling.
ClosedPublic

Authored by balazske on Mar 5 2020, 7:29 AM.

Details

Summary

A new field is added to the stream state to store the actual error.
Function fseek is implemented to set the error state.
(More are to be added later.)
Modeling of stream error state handling functions is added.

(clearerr is not done yet but should follow.)

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

What is better? Have this (or similar) change that adds a feature that is used in a later change and is only testable in that later change, or have a bigger change that contains real use of the added features? (This means: Add fseek to this change or not.) The error handling is not normally testable if there is no function that generates error state, fseek could be one.

balazske marked 4 inline comments as done.Mar 11 2020, 7:06 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
68–80

Have a bool isOpened and an error kind is better?

What is better? Have this (or similar) change that adds a feature that is used in a later change and is only testable in that later change, or have a bigger change that contains real use of the added features? (This means: Add fseek to this change or not.) The error handling is not normally testable if there is no function that generates error state, fseek could be one.

I think neither: add the feature (the last enum value and its handling in the functions) in the fseek() patch were it is used.

And why not add first the "unknown error" state and later the feof and ferror error states? Or add every error state but not the feof and ferror functions? Every way results in an intermediate state of the checker.

I see that we're a bit stuck on naming, and that is largely my fault. Apologies, let us focus on high level stuff for a bit.

I do not understand totally this remove-of-AnyError thing. If handling the AnyError will be removed the code remains still not testable.

Remove the code that adds and handles AnyError. Merge the two enums in the StreamState, and make ensureSteamOpened emit a warning if the stream is either feof() or ferror(). Add NoteTags saying something along the lines of:

if (feof(F)) { // note: Assuming the condition is true
               // note: Stream 'F' has the EOF flag set
  fgets(F); // warning: Illegal stream operation on a stream that has EOF set
}

If we added warnings to this patch, that would be testable.

What is better? Have this (or similar) change that adds a feature that is used in a later change and is only testable in that later change, or have a bigger change that contains real use of the added features? (This means: Add fseek to this change or not.) The error handling is not normally testable if there is no function that generates error state, fseek could be one.

Definitely the latter, and all you need to do is check whether the stream is in an error state in ensureStreamOpened, no need for fseek just yet.

I think neither: add the feature (the last enum value and its handling in the functions) in the fseek() patch were it is used.

Yup.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
68–80

That sounds wonderful, @balazske! :)

79

Okay, sure.

Also, thank you guys for chipping in!

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
428

This is a very interesting question indeed, never crossed my mind before. I think it would be better to move AnyError to the next revision and discuss it there.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
68–80

Yes, we do not need the OpenFailed state either. A stream is either opened or not. This is the state of the stream. If there is any error, there are the error codes for.

balazske updated this revision to Diff 249642.Mar 11 2020, 8:42 AM
  • Removed AnyError.
  • Using common code for evalFeof and ferror.
balazske updated this revision to Diff 249643.Mar 11 2020, 8:45 AM

Adding correct diff.

Harbormaster completed remote builds in B48825: Diff 249643.

Could you please add the warning we discussed? That would be a great demonstration of this patch's capabilities, and wouldn't deserve its own patch.

Could you please add the warning we discussed? That would be a great demonstration of this patch's capabilities, and wouldn't deserve its own patch.

Was it this warning?

if (feof(F)) { // note: Assuming the condition is true
               // note: Stream 'F' has the EOF flag set
  fgets(F); // warning: Illegal stream operation on a stream that has EOF set
}

The checker does not work by "assuming feof is true" (this can be a later feature, if the F is not tracked and a feof call is encountered). Value of feof (more precisely: if any error happened) is fixed when the file operation is called that causes the error. The warning above should be get even if the feof call is missing because it is still possible that the previous operation resulted in EOF state. Adding a similar warning is not a small change and the "infrastructure" for it is not ready in this revision. And still that warning would not be testable because FEof error state is never set in this patch. By the way, a file read is not invalid if any error is there, it simply fails again. But there is many possibility for warnings such as "file read called in EOF state", "feof called when it is proven to be false/true", "invalid file read/write after failed fseek/fread/fwrite" but these belong to next revisions.

Probably it would be better to make a full working prototype first, because the design decisions made now can turn out to be wrong later when the new functionality would be added and we do "not see the full picture" now.

balazske marked 9 inline comments as done.Mar 13 2020, 1:47 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
68–80

The "error kind" concept is needed separately from the stream state, see D75851. This is a reason for not merging them.

417

This is not applicable any more, AnyError is removed (from here).

428

The state fork problem is solved with UnknownError is the later revision, here it is no problem any more. A single state split is always needed because the return values differ in the failed and success case, and I want to make correlation between return values and stream error state.

Thanks for sticking this out! It just takes me a while to get to the mental space regarding this patch you are already in, sometimes through saying incorrect statements or stupid suggestions! :) If others could chip in in this discussion, it would be most appreciated, because I fear that otherwise this patch is a bit too exposed to my own wrongs.

Could you please add the warning we discussed? That would be a great demonstration of this patch's capabilities, and wouldn't deserve its own patch.

Was it this warning?

if (feof(F)) { // note: Assuming the condition is true
               // note: Stream 'F' has the EOF flag set
  fgets(F); // warning: Illegal stream operation on a stream that has EOF set
}

The warning above should be get even if the feof call is missing because it is still possible that the previous operation resulted in EOF state.

What I meant to demonstrate (but failed to describe) is that if the condition of the if statement is !feof(F), we shouldn't get the warning.

The checker does not work by "assuming feof is true" (this can be a later feature, if the F is not tracked and a feof call is encountered). Value of feof (more precisely: if any error happened) is fixed when the file operation is called that causes the error. [...] And still that warning would not be testable because FEof error state is never set in this patch.

I suspect that at this point you are 2 steps ahead of me, but this is how I would imagine such a patch to look like: we evaluate feof() by splitting the state (this should be worth the cost, because why would we ever call feof if the state couldn't be EOF or non-EOF?), associate the state where its return value is true with the argument being marked as EOF, and the other as non-EOF.

This obviously wouldn't be sufficient -- values retrieved from stream reading operations may be EOF themselves, which should make us set the appropriate state for the originating stream, but that should indeed be the topic of a followup patch.

What is the goal here if not this? I have a suspicion I'm just not seeing the obvious here, but I don't get it just yet.

By the way, a file read is not invalid if any error is there, it simply fails again.

Oh, yea, silly me. You're totally right :)

Probably it would be better to make a full working prototype first, because the design decisions made now can turn out to be wrong later when the new functionality would be added and we do "not see the full picture" now.

If you try to implement a warning just to see how this would turn out long term, that would indeed be a good idea. Feel free to upload them, if you prefer, as a WIP patch!

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
419–420

Is this correct? We set the feof() return value whether we know the state of the stream to be EOF, but in this case we would incorrectly mark it as non-EOF:

$ cat a.txt # contains a single newline

$ cat test.cpp
#include <cstdio>

void tryRead(FILE *F) {
  char C = fgetc(F);

  if (feof(F))
    printf("The stream is EOF!\n");
  else
    printf("The stream is good! Read '%c'\n", C);
}

int main() {
  FILE *F = fopen("a.txt", "r");
  tryRead(F);
  tryRead(F); // Incorrectly mark F to be non-EOF
}
$ clang test.cpp -fsanitize=address && ./a.out 
The stream is good! Read '
'
The stream is EOF!

Wouldn't it be safer to assume it to be true if we know its EOF and do nothing otherwise? How about a state split?

(I know this functions also handler FError, but you see what my point is.)

clang/test/Analysis/stream-error.c
34–35

Sure, we don't, but the bigger issue here is that we wouldn't mark F to be in EOF even if we did handle it, we would also need to understand that ch == EOF is the telling sign. But that also ins't in the scope of this patch :)

balazske marked an inline comment as done.Mar 13 2020, 9:14 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
419–420

The problem in this code seems to be that fgetc is not handled (yet) by the checker. The plan is to handle every possible file operation. The handler of fgetc would split the state to EOF and non-EOF case. In the current state this is a real "bug", it could be handled somehow by checking for escaped stream (the fgetc here should cause escape of the file pointer if not recognized as file operation). But this means again a new patch before this one. Or add every file operation to this patch with some simple implementation or with "forgetting" the stream?

NoQ added a comment.Mar 15 2020, 7:54 PM

It looks like for now there's no way to put the stream into an error state, right?

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
68–80

Yes, we do not need the OpenFailed state either. A stream is either opened or not. This is the state of the stream. If there is any error, there are the error codes for.

Nope, there's also the state in which the stream was before it was opened, and it's different from being opened and closed. Most of the time we simply don't track the stream in such state, but when an open fails, we suddenly start tracking it, so we need a separate enum. Like, we'll have to emit different diagnostic messages depending on whether the stream is in Closed state or in an OpenFailed state and this information should be there at the bug report site, so it's part of the program state.

88

llvm::function_ref?

112

Simply C.getLocationContext().

392

So what exactly happens when you're calling clearerr on a closed stream? Does it actually become opened? Also what happens when you're calling it on an open-failed stream?

405–407

Maybe we should add a helper function for this, CallEvent::getOriginCallExpr(). That's an otherwise ugly function given that CallEvent is a class hierarchy and the superclass doesn't need to know about subclasses, but most users do in fact only care about call-expressions and concise checker API is important. Same with dyn_cast_or_null<FunctionDecl>(Call.getDecl()) that could be written as Call.getFunctionDecl().

414

I recommend not introducing a new symbol when you're going to return 0 anyway. A plain 0 is easier on the constraint solver than a symbol constrained to 0.

clang/test/Analysis/stream-error.c
34–35

Yeah, that's gonna be fun.

balazske marked 6 inline comments as done.Mar 16 2020, 2:00 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
88

function_ref's documentation says:

This class does not own the callable, so it is not in general safe to store a function_ref.

The FnDescription stores these functions so it is not safe to use llvm::function_ref?

392

The stream gets always in opened and error-free state. The preDefault checks that the stream is opened (not closed and not open-failed) and creates error node if not. If this create fails that is a problem case, the stream will be opened after clearerr. Should check in the eval functions for closed stream even if the pre-functions (in preCall callback) have normally checked this?

405–407

Can be done in another revision together with changing every use of dyn_cast_or_null<CallExpr>(Call.getOriginExpr()) to the new function.

clang/test/Analysis/stream-error.c
34–35

If there is a fputc call (for example) that is recognized (has recognized stream and no functions "fail") there is already an error and non-error branch created for it. On the error branch the return value of fputc would be EOF and the stream state is set to error. (The fputc should be later modeled by this checker, not done yet.)

If the ch == EOF is found then to figure out if this ch comes from an fputc that is called on a stream that is not recognized (it may be function parameter) and then make a "tracked" stream with error state for it, this is another thing.

NoQ added inline comments.Mar 16 2020, 7:02 AM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
88

I think you're using it only for global function pointers, no?

392

Aha, ok, got it.
Let's assert it then?

clang/test/Analysis/stream-error.c
34–35

Aha, fair enough, that's probably a well-justified state split.

balazske updated this revision to Diff 250588.Mar 16 2020, 10:07 AM
balazske marked 2 inline comments as done.

Addressed some comments.

balazske marked 3 inline comments as done.Mar 16 2020, 10:10 AM
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
88

Probably can work but I tried it and got compile errors.

Riiiight I think I finally get it. You don't want to state split on feof() and ferror(), but rather on the stream operations! This is why you only want to tell the analyzer what the return value of these functions are going to be, because the state of the stream should be set by the time we reach these functions, right?

How about untracked streams? What if we call feof() on a stream we got from a parameter, wouldn't that suggest that the stream is probably non-null and could either be EOF or non-EOF?

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
88

It doesn't matter a whole lot, we could leave this as-is :)

Riiiight I think I finally get it. You don't want to state split on feof() and ferror(), but rather on the stream operations!

Yes the split is at the operations. I did not think even of splitting at start of feof to determine "backwards" the result of the previous operation. This could be another approach. But the way of split depends on the previous operation (to check if error is possible based on possible constraints on its return value), probably not better (but less state split?).

How about untracked streams? What if we call feof() on a stream we got from a parameter, wouldn't that suggest that the stream is probably non-null and could either be EOF or non-EOF?

Currently the operation is completely ignored on untracked stream. Could start with a non-null stream and every error case is possible.

Riiiight I think I finally get it. You don't want to state split on feof() and ferror(), but rather on the stream operations!

Yes the split is at the operations. I did not think even of splitting at start of feof() to determine "backwards" the result of the previous operation. This could be another approach. But the way of split depends on the previous operation (to check if error is possible based on possible constraints on its return value), probably not better (but less state split?).

This is probably why it took a bit for us to understand each other! :) Generally speaking, we try to avoid state splitting as much as possible, but you totally convinced me, this isn't the place where we should be conservative. One should always expect stream operations to potentially fail, and that really does introduce two separate paths of execution.

The main issue that remains is testability, and now that we are on the same page, I see why you were concerned about it. I have a couple ideas:

  • For streams where the precise state is unknown (they are not tracked), start tracking. If we explicitly check whether a state is foef(), we can rightfully assume both of those possibilities.
  • Add debug function similar to clang_analyzer_express, like clang_analyzer_set_eof(FILE *), etc.
  • Evalulate a less complicated stream modeling function that sets such flags, though I suspect the followup patch is supposed to be the one doing this.

What do you think?

  • For streams where the precise state is unknown (they are not tracked), start tracking. If we explicitly check whether a state is foef(), we can rightfully assume both of those possibilities.

This can follow in a later revision only. The new stream should have every error state possible, so when it is introduced 3 new states are required (with the error types), or an "unknown error" is needed that will be added later.

  • Add debug function similar to clang_analyzer_express, like clang_analyzer_set_eof(FILE *), etc.

How to add this? Probably it is too much work and I do not like if separate such debug functions are there for "every" checker (or many of).

  • Evalulate a less complicated stream modeling function that sets such flags, though I suspect the followup patch is supposed to be the one doing this.

The fseek patch can do this.

  • For streams where the precise state is unknown (they are not tracked), start tracking. If we explicitly check whether a state is foef(), we can rightfully assume both of those possibilities.

This can follow in a later revision only. The new stream should have every error state possible, so when it is introduced 3 new states are required (with the error types), or an "unknown error" is needed that will be added later.

  • Evalulate a less complicated stream modeling function that sets such flags, though I suspect the followup patch is supposed to be the one doing this.

The fseek patch can do this.

Very well.

  • Add debug function similar to clang_analyzer_express, like clang_analyzer_set_eof(FILE *), etc.

How to add this? Probably it is too much work and I do not like if separate such debug functions are there for "every" checker (or many of).

@steakhal recently made a patch that contained that and not much else: D74131. I share you dislike with the debug function, but I suspect that it wouldn't be too much work, especially compared to my other suggestions that would introduce testability, unless you have some other ideas (to which I'd be open to!).

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
28

Create -> Creation

Adding special test functions is not as easy, then StreamState should be accessible from another checker. It could be added to the same file, or new file but then moving the data structures into header is needed. At least for short-term it is more simple to add a stream function that generates error state (fseek is applicable or other). This is why the fseek patch is better to be included in this change (but makes it more complicated). Otherwise this revision must be added without tests for state observer functions.

Adding special test functions is not as easy, then StreamState should be accessible from another checker. It could be added to the same file, or new file but then moving the data structures into header is needed.

For the time being, I don't fancy the idea of moving code to a header file.

At least for short-term it is more simple to add a stream function that generates error state (fseek is applicable or other). This is why the fseek patch is better to be included in this change (but makes it more complicated). Otherwise this revision must be added without tests for state observer functions.

It seems like D75356 should have been the revision to finish, after all :^). I agree -- don't worry about bloating the patch a bit, I've spend a lot of time with your patches, I feel confident in my ability at this point to be able to thoroughly review it.

So what is missing or wrong in this change to be accepted?

So what is missing or wrong in this change to be accepted?

This change is functional but is not testable. But the high level idea is great! I think we should maybe merge this with the followup, as you originally intended to.

Also, judging from the fact that how differently we imagined this to go (even though imo your idea is superior to mine), we should explain what our state splitting ideology is in the code.

balazske updated this revision to Diff 253587.Mar 30 2020, 7:06 AM

Added test checker.

Ping.
It is now testable, a sub-checker was added for testing (that can be useful at later improvements too).

Szelethus accepted this revision.Apr 6 2020, 3:39 AM

LGTM, let's go! Thank you so much for sticking this out!

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
29–30

Can a stream be both feof and ferror? Let's not delay this patch one moment longer, but a TODO to give this some thought might be nice.

clang/test/Analysis/stream-error.c
1

core isn't enabled! :o

This revision is now accepted and ready to land.Apr 6 2020, 3:39 AM
balazske marked an inline comment as done.Apr 6 2020, 5:45 AM
balazske added inline comments.
clang/test/Analysis/stream-error.c
1

Is it needed? It was probably not set (and is not) in the original test in stream.c.

balazske updated this revision to Diff 255360.Apr 6 2020, 8:39 AM
  • Adding comment about ferror, feof.
  • Enabled core checker in test file.
NoQ added inline comments.Apr 6 2020, 11:29 AM
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
462

This should go into the debug package. We don't want users to enable it accidentally by enabling the whole unix package.

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
25–27

assertStreamStateOpened(SS); isn't much harder to type. Can we make it a function please?

Szelethus added inline comments.Apr 6 2020, 12:09 PM
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
462

Oh, nice catch!

clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
25–27

Mind that this is completely stripped from release builds, unlike a function call. I think.

NoQ added inline comments.Apr 6 2020, 12:47 PM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
25–27

I'd be shocked if that's the case. In LLVM we have a lot of tiny functions all over the place and we'd better have them inlined most of the time.

Szelethus added inline comments.Apr 6 2020, 1:14 PM
clang/lib/StaticAnalyzer/Checkers/StreamChecker.cpp
25–27

Ah, okay. Lets turn this into a regular function then!

balazske updated this revision to Diff 255602.Apr 6 2020, 11:58 PM
balazske marked an inline comment as done.

Moved test checker to debug package, changed macro to function.

Szelethus accepted this revision.Apr 7 2020, 5:22 AM

Whoo!

balazske updated this revision to Diff 255906.Apr 7 2020, 11:51 PM

Adding diff to master.

This revision was automatically updated to reflect the committed changes.