Adding file handling functions from the POSIX standard (2017).
A new checker option is introduced to enable them.
In follow-up patches I am going to upstream networking, pthread, and other
groups of POSIX functions.
Differential D82288
[analyzer][StdLibraryFunctionsChecker] Add POSIX file handling functions martong on Jun 22 2020, 2:40 AM. Authored by
Details Adding file handling functions from the POSIX standard (2017). In follow-up patches I am going to upstream networking, pthread, and other
Diff Detail
Event TimelineComment Actions Good job, I really fancy the Summary syntax ๐
Comment Actions I see a lot of NoEvalCall, but I wonder whether modifying errno warrants this. Shouldn't we have a alongside NoEvalCall and EvalCallAsPure an EvalCallAsPureButInvalidateErrno invalidation kind? Also, I'm kind of worried by this checker taking on the responsibility of modeling functions by EvalCall, except for a few functions that it also models, but not with EvalCall, it feels clunky. I remember this one time when I [[ https://reviews.llvm.org/D68165?id=222257#inline-612891 | wanted to model alloca()]], which has a __builtin_alloca variant as well, and I spent a good couple hours figuring out that why the descpription {CDF_MaybeBuiltin, "alloca", 1} was causing crashes. Turns out that BuiltinFunctionsChecker EvalCalled the builtin version, but not the other. In general, I don't think we have a well established stance on whether we use EvalCall, or PreCall+PostCall, and what roles do batch modeling checkers like StdLibraryFunctionsChecker, GenericTaintChecker and BuiltinFunctionsChecker should play alongside mode sophisticated modeling checkers like CStringModeling, DynamicMemoryModeling and StreamChecker. I bet some of us have ideas how this is supposed to be done, but I feel like there is no unified, agreed upon road we're moving towards. This problem came up in D69662 as well. D75612 introduces EvalCalls to StreamChecker, and I was puzzled why this hasn't been a thing in all similar chekers like MallocChecker, which I commented on in D81315#2079457. It would also be great if we could enforce that no two checkers evaluate the same function. None of what I said is a problem introduced by this specific patch, but it definitely amplifies it with the modeling of functions such as strncasecmp (should be the responsibility of CStringModeling), strdup (MallocChecker) and popen (StreamChecker). I'd like to see something set in stone, such as what you're eluding to in D75612#inline-690087, that the role of this checker is to model numerical (null-ness, ranges, etc) pre- and post conditions for standard functions, nothing less, nothing more, and it is technically an error to reimplement this in other checkers. If that is the goal, I guess we have to implement tests for individual functions added in this patch to make sure that these conditions aren't already checked elsewhere. I didn't read this patch line-by-line, but I trust that its okay if Endre did so.
Comment Actions I find this piece of information extremely valuable. I tried to match every signature to the corresponding constraints, and as I found no glaring errors, I will accept this. Comment Actions Thanks for the review guys! Maybe there is a misunderstanding here. Using NoEvalCall results that the function may be evaluated conservatively or evaluated by any other checker: case NoEvalCall: // Summary tells us to avoid performing eval::Call. The function is possibly // evaluated by another checker, or evaluated conservatively. return false; In my understanding, it means that errno could be invalidated by any other checker. Consequently, these changes made here are not affecting any existing modeling of errno. (We use NoEvalCall in all functions in this patch.)
I'd rather not confuse the terms here. If a function is pure that means it does not have any side effects, i.e. it must not modify the global errno either. On the other hand, I see your point, and this is a good point. But maybe a different naming would be needed: what about EvalCallHereAndInvalidateErrno?
Yes, that would be great, but I don't see any clear solution to achieve that right now, this is really challenging. One very immature idea: Maybe we could have a special option in the engine that is used only in testing and which would let the evaluation to go further even if a checker returned true in evalCall. And if we find a second event like that then we could assert.
Well, perhaps we should have had this conversation in a form of an RFC in cfe-dev instead. Next time maybe. Now, I am afraid that the review of the concrete changes in this patch are being neglected because the focus may be shifted here on the evalCall returns true problem (which does not happen in this patch).
Yes, there are many standard functions that are modeled in different checkers. Ideally all functions would be modeled in just one checker, I agree. One goals is to add argument constraints to functions in the C, C++ and POSIX standard. Some of these argument constraints are indeed handled in e.g. CStringModeling. However, it is very easy to gather these argument constraints and apply them massively. By doing this, the analysis can be more precise, so, I see a lot of gain here. Sooner or later popen would be modeled in StreamChecker and then we may remove the summary from here. But if we leave it here that would not be a problem either. I consider this checker as a generic checker and StreamChecker or CStringModeling as a specialized checker. The specialized checkers should be a weak dependency to this generic checker, so any bug related to e.g. popen is prioritized to StreamChecker. That's exactly what we did in D79420. The other very important goal would be to make it possible to add argument constraints (or branches/cases) to any library functions. These libraries may never be modeled in the analyzer. In this case the problem is non-existent. Perhaps we should divide the checker to two different checkers once we reach that point.
Well, okay :) But, at least one body should go through each line, otherwise how can we make sure that I don't inject some blunder in any of those lines? I don't expect you to check the specification of each function in the POSIX standard, just to have at least a quick look whether the argument constraints match the signature of the functions. Comment Actions
I just now realized that it is actually not true, sorry about that. So, seems like fnmatch, strcasecmp, strncasecmp are EvalCallAsPure. I could skip them from this patch, if you guys consider this as dangerous and to bolster my previous comment. Comment Actions I went through this a lot more thoroughly, as well as the previous patch, and this looks great, especially for an alpha option. I will admit, I'm a bit concerned by the lack of individual tests (what if a stupid interaction with another checker causes issues?), but I feel the need for a scale-able solution and your debug functions serve to help on this. If you don't mind, before I formally accept, it would be great to talk this over in a meeting and/or on cfe-dev, just so that I'm a bit more in the clear on what the direction is. Sure, good point, its just that this is the patch in my eye that really grows the impact of this checker (and transitively, the scope of the evalCall problem) above a threshold, which is why my concern was voiced here. But you're totally right, important discussions on design that is outside the scope of a single checker should have more visibility.
Either is good, and it would be better in a followup patch indeed.
Thats not the point I wanted to get across -- its fine if multiple checkers model different aspects of the same function, but it should be done consistently, which at the moment isn't. It would be nice to have an agreement (even if that later changes) on where we see function modeling when this checker is a bit more mature. That also deserves its own discussion on cfe-dev.
Or, numerical constraints would be checked here, the mentioned checkers would strongly depend on this one. That way, ranges and nullability would be focused here, and more specific functionalities in their respective checkers.
Sure! I would not accept if I didn't have confirmation that someone (possibly me) did do that! :) Just wanted to get the high level stuff out of the way. Comment Actions The changes proposed in mailing list thread (http://lists.llvm.org/pipermail/cfe-dev/2020-June/066017.html) seem ortogonal to this change. Sorry for the slack today, I just peeked around in MallocChecker to look for conflicts, but it seems like there isn't any :) LGTM! |
I'm fine with not hiding this, but as a user, why would I never not enable it? If the reasoning is that this isn't ready quite yet, change Released to InAlpha. Otherwise, either mark this Hidden or give a user friendly explanation as to what is there to gain/lose by tinkering with this option.