This is an archive of the discontinued LLVM Phabricator instance.

[analyzer][StdLibraryFunctionsChecker] Add POSIX file handling functions
ClosedPublic

Authored by martong on Jun 22 2020, 2:40 AM.

Details

Summary

Adding file handling functions from the POSIX standard (2017).
A new checker option is introduced to enable them.

In follow-up patches I am going to upstream networking, pthread, and other
groups of POSIX functions.

Diff Detail

Event Timeline

martong created this revision.Jun 22 2020, 2:40 AM
Herald added a project: Restricted Project. ยท View Herald TranscriptJun 22 2020, 2:40 AM

Good job, I really fancy the Summary syntax ๐Ÿ‘

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
740

I presume that typically there are only a handful of Decls in LookupRes, so it not worth the complexity of using std::algorithms to sort/partition/find.

1591

If Off_tMax is only used in the if block that follows, consider moving it's declaration inside.

I see a lot of NoEvalCall, but I wonder whether modifying errno warrants this. Shouldn't we have a alongside NoEvalCall and EvalCallAsPure an EvalCallAsPureButInvalidateErrno invalidation kind?

Also, I'm kind of worried by this checker taking on the responsibility of modeling functions by EvalCall, except for a few functions that it also models, but not with EvalCall, it feels clunky. I remember this one time when I [[ https://reviews.llvm.org/D68165?id=222257#inline-612891 | wanted to model alloca()]], which has a __builtin_alloca variant as well, and I spent a good couple hours figuring out that why the descpription {CDF_MaybeBuiltin, "alloca", 1} was causing crashes. Turns out that BuiltinFunctionsChecker EvalCalled the builtin version, but not the other.

In general, I don't think we have a well established stance on whether we use EvalCall, or PreCall+PostCall, and what roles do batch modeling checkers like StdLibraryFunctionsChecker, GenericTaintChecker and BuiltinFunctionsChecker should play alongside mode sophisticated modeling checkers like CStringModeling, DynamicMemoryModeling and StreamChecker. I bet some of us have ideas how this is supposed to be done, but I feel like there is no unified, agreed upon road we're moving towards. This problem came up in D69662 as well. D75612 introduces EvalCalls to StreamChecker, and I was puzzled why this hasn't been a thing in all similar chekers like MallocChecker, which I commented on in D81315#2079457.

It would also be great if we could enforce that no two checkers evaluate the same function.

None of what I said is a problem introduced by this specific patch, but it definitely amplifies it with the modeling of functions such as strncasecmp (should be the responsibility of CStringModeling), strdup (MallocChecker) and popen (StreamChecker). I'd like to see something set in stone, such as what you're eluding to in D75612#inline-690087, that the role of this checker is to model numerical (null-ness, ranges, etc) pre- and post conditions for standard functions, nothing less, nothing more, and it is technically an error to reimplement this in other checkers. If that is the goal, I guess we have to implement tests for individual functions added in this patch to make sure that these conditions aren't already checked elsewhere.

I didn't read this patch line-by-line, but I trust that its okay if Endre did so.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
352โ€“364

D78118#inline-723679 Please mark this Hidden.

358โ€“363

I'm fine with not hiding this, but as a user, why would I never not enable it? If the reasoning is that this isn't ready quite yet, change Released to InAlpha. Otherwise, either mark this Hidden or give a user friendly explanation as to what is there to gain/lose by tinkering with this option.

martong marked 8 inline comments as done.Jun 24 2020, 7:12 AM
martong added inline comments.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
352โ€“364

Ok, it missed my attention about that back then.

358โ€“363

Yes, thanks, I wanted to make it similar to alpha checkers, now I know it is InAlpha that I need.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
740

There may be many Decls in the lookup result, but I don't know based on which property would it make sense to sort them. They are essentially stored in the order of creation, so as the parser reads an upcoming Decl.

1591

Yeah, good catch, thanks.

martong updated this revision to Diff 273018.Jun 24 2020, 7:12 AM
martong marked 4 inline comments as done.
  • Add 'Hide' and 'InAlpha', move Off_tMax inside the brace
gamesh411 accepted this revision.Jun 24 2020, 7:51 AM

I see a lot of NoEvalCall, but I wonder whether modifying errno warrants this. Shouldn't we have a alongside NoEvalCall and EvalCallAsPure an EvalCallAsPureButInvalidateErrno invalidation kind?

Also, I'm kind of worried by this checker taking on the responsibility of modeling functions by EvalCall, except for a few functions that it also models, but not with EvalCall, it feels clunky. I remember this one time when I [[ https://reviews.llvm.org/D68165?id=222257#inline-612891 | wanted to model alloca()]], which has a __builtin_alloca variant as well, and I spent a good couple hours figuring out that why the descpription {CDF_MaybeBuiltin, "alloca", 1} was causing crashes. Turns out that BuiltinFunctionsChecker EvalCalled the builtin version, but not the other.

In general, I don't think we have a well established stance on whether we use EvalCall, or PreCall+PostCall, and what roles do batch modeling checkers like StdLibraryFunctionsChecker, GenericTaintChecker and BuiltinFunctionsChecker should play alongside mode sophisticated modeling checkers like CStringModeling, DynamicMemoryModeling and StreamChecker. I bet some of us have ideas how this is supposed to be done, but I feel like there is no unified, agreed upon road we're moving towards. This problem came up in D69662 as well. D75612 introduces EvalCalls to StreamChecker, and I was puzzled why this hasn't been a thing in all similar chekers like MallocChecker, which I commented on in D81315#2079457.

It would also be great if we could enforce that no two checkers evaluate the same function.

None of what I said is a problem introduced by this specific patch, but it definitely amplifies it with the modeling of functions such as strncasecmp (should be the responsibility of CStringModeling), strdup (MallocChecker) and popen (StreamChecker). I'd like to see something set in stone, such as what you're eluding to in D75612#inline-690087, that the role of this checker is to model numerical (null-ness, ranges, etc) pre- and post conditions for standard functions, nothing less, nothing more, and it is technically an error to reimplement this in other checkers. If that is the goal, I guess we have to implement tests for individual functions added in this patch to make sure that these conditions aren't already checked elsewhere.

I didn't read this patch line-by-line, but I trust that its okay if Endre did so.

I find this piece of information extremely valuable.
This description about the state we are in right now should prove very educational to whoever embarking on getting the evalCall business done.

I tried to match every signature to the corresponding constraints, and as I found no glaring errors, I will accept this.

This revision is now accepted and ready to land.Jun 24 2020, 7:51 AM

Thanks for the review guys!
I understand there are concerns about evalCall, so I am trying to dissolve those concerns below. :)

I see a lot of NoEvalCall, but I wonder whether modifying errno warrants this.

Maybe there is a misunderstanding here. Using NoEvalCall results that the function may be evaluated conservatively or evaluated by any other checker:

case NoEvalCall:
  // Summary tells us to avoid performing eval::Call. The function is possibly
  // evaluated by another checker, or evaluated conservatively.
  return false;

In my understanding, it means that errno could be invalidated by any other checker. Consequently, these changes made here are not affecting any existing modeling of errno. (We use NoEvalCall in all functions in this patch.)

Shouldn't we have a alongside NoEvalCall and EvalCallAsPure an EvalCallAsPureButInvalidateErrno invalidation kind?

I'd rather not confuse the terms here. If a function is pure that means it does not have any side effects, i.e. it must not modify the global errno either. On the other hand, I see your point, and this is a good point. But maybe a different naming would be needed: what about EvalCallHereAndInvalidateErrno?
Since all of the functions here are not pure - they have NoEvalCall set -, probably we should have just simply InvalidateErrno rather, what do you think? I agree, that this checker is a good place to invalidate errno where applicable, but If you don't mind I'd rather do that in a subsequent patch to keep the incremental approach.

It would also be great if we could enforce that no two checkers evaluate the same function.

Yes, that would be great, but I don't see any clear solution to achieve that right now, this is really challenging. One very immature idea: Maybe we could have a special option in the engine that is used only in testing and which would let the evaluation to go further even if a checker returned true in evalCall. And if we find a second event like that then we could assert.

None of what I said is a problem introduced by this specific patch

Well, perhaps we should have had this conversation in a form of an RFC in cfe-dev instead. Next time maybe. Now, I am afraid that the review of the concrete changes in this patch are being neglected because the focus may be shifted here on the evalCall returns true problem (which does not happen in this patch).

... the modeling of functions such as strncasecmp (should be the responsibility of CStringModeling), strdup (MallocChecker) and popen (StreamChecker). I'd like to see something set in stone, such as what you're eluding to in D75612#inline-690087, that the role of this checker is to model numerical (null-ness, ranges, etc) pre- and post conditions for standard functions, nothing less, nothing more, and it is technically an error to reimplement this in other checkers. If that is the goal, I guess we have to implement tests for individual functions added in this patch to make sure that these conditions aren't already checked elsewhere.

Yes, there are many standard functions that are modeled in different checkers. Ideally all functions would be modeled in just one checker, I agree.

One goals is to add argument constraints to functions in the C, C++ and POSIX standard. Some of these argument constraints are indeed handled in e.g. CStringModeling. However, it is very easy to gather these argument constraints and apply them massively. By doing this, the analysis can be more precise, so, I see a lot of gain here. Sooner or later popen would be modeled in StreamChecker and then we may remove the summary from here. But if we leave it here that would not be a problem either. I consider this checker as a generic checker and StreamChecker or CStringModeling as a specialized checker. The specialized checkers should be a weak dependency to this generic checker, so any bug related to e.g. popen is prioritized to StreamChecker. That's exactly what we did in D79420.

The other very important goal would be to make it possible to add argument constraints (or branches/cases) to any library functions. These libraries may never be modeled in the analyzer. In this case the problem is non-existent. Perhaps we should divide the checker to two different checkers once we reach that point.

I didn't read this patch line-by-line, but I trust that its okay if Endre did so.

Well, okay :) But, at least one body should go through each line, otherwise how can we make sure that I don't inject some blunder in any of those lines? I don't expect you to check the specification of each function in the POSIX standard, just to have at least a quick look whether the argument constraints match the signature of the functions.

We use NoEvalCall in all functions in this patch

I just now realized that it is actually not true, sorry about that. So, seems like fnmatch, strcasecmp, strncasecmp are EvalCallAsPure. I could skip them from this patch, if you guys consider this as dangerous and to bolster my previous comment.

Szelethus added a comment.EditedJun 24 2020, 9:43 AM

I went through this a lot more thoroughly, as well as the previous patch, and this looks great, especially for an alpha option. I will admit, I'm a bit concerned by the lack of individual tests (what if a stupid interaction with another checker causes issues?), but I feel the need for a scale-able solution and your debug functions serve to help on this. If you don't mind, before I formally accept, it would be great to talk this over in a meeting and/or on cfe-dev, just so that I'm a bit more in the clear on what the direction is.

None of what I said is a problem introduced by this specific patch

Well, perhaps we should have had this conversation in a form of an RFC in cfe-dev instead. Next time maybe. Now, I am afraid that the review of the concrete changes in this patch are being neglected because the focus may be shifted here on the evalCall returns true problem (which does not happen in this patch).

Sure, good point, its just that this is the patch in my eye that really grows the impact of this checker (and transitively, the scope of the evalCall problem) above a threshold, which is why my concern was voiced here. But you're totally right, important discussions on design that is outside the scope of a single checker should have more visibility.

But maybe a different naming would be needed: what about EvalCallHereAndInvalidateErrno?
Since all of the functions here are not pure - they have NoEvalCall set -, probably we should have just simply InvalidateErrno rather, what do you think? I agree, that this checker is a good place to invalidate errno where applicable, but If you don't mind I'd rather do that in a subsequent patch to keep the incremental approach.

Either is good, and it would be better in a followup patch indeed.

Yes, there are many standard functions that are modeled in different checkers. Ideally all functions would be modeled in just one checker, I agree.

Thats not the point I wanted to get across -- its fine if multiple checkers model different aspects of the same function, but it should be done consistently, which at the moment isn't. It would be nice to have an agreement (even if that later changes) on where we see function modeling when this checker is a bit more mature. That also deserves its own discussion on cfe-dev.

[...] Sooner or later popen would be modeled in StreamChecker and then we may remove the summary from here. But if we leave it here that would not be a problem either. I consider this checker as a generic checker and StreamChecker or CStringModeling as a specialized checker. The specialized checkers should be a weak dependency to this generic checker, so any bug related to e.g. popen is prioritized to StreamChecker. That's exactly what we did in D79420.

Or, numerical constraints would be checked here, the mentioned checkers would strongly depend on this one. That way, ranges and nullability would be focused here, and more specific functionalities in their respective checkers.

I didn't read this patch line-by-line, but I trust that its okay if Endre did so.

Well, okay :) But, at least one body should go through each line, otherwise how can we make sure that I don't inject some blunder in any of those lines? I don't expect you to check the specification of each function in the POSIX standard, just to have at least a quick look whether the argument constraints match the signature of the functions.

Sure! I would not accept if I didn't have confirmation that someone (possibly me) did do that! :) Just wanted to get the high level stuff out of the way.

Szelethus accepted this revision.Jul 1 2020, 6:32 AM

The changes proposed in mailing list thread (http://lists.llvm.org/pipermail/cfe-dev/2020-June/066017.html) seem ortogonal to this change. Sorry for the slack today, I just peeked around in MallocChecker to look for conflicts, but it seems like there isn't any :) LGTM!

This revision was automatically updated to reflect the committed changes.