Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Unit tests: pass. 62418 tests passed, 0 failed and 845 were skipped.
clang-tidy: pass.
clang-format: pass.
Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml
Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
452 | Yeah, a must-have for this check to be enabled by default would be to be able to provide a specific warning message for every function. I guess we could include them in the summaries as an extra argument of ArgConstraint. | |
455 | Let's test our notes. That'll be especially important when we get to non-concrete values, because the visitor might need to be expanded (or we might need a completely new visitor). |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
442 | Dealing with only concrete ints might be a good start but we might want to handle symbolic cases in the future like: if (v > 255) return isalpha(v); I am ok with not addig this in the first version but adding TODOs and test cases upfront cannot hurt. So basivally, I was wondering if we should query the solver for the result instead of matching the sval kind and just early return if we do not want to support a specific kind. |
I wouldn't like to see reports emitted by a checker that resides in apiModeling. Could we create a new one? Some checkers, like the IteratorChecker, MallocChecker and CStringChecker implement a variety of user-facing checkers within the same class, that is also an option, if creating a new checker class is too much of a hassle.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
704–706 | This is true for the rest of the summaries as well, but shouldn't we retrieve the unsigned char size from ASTContext? |
Yes, we could split the warning emitting part to a new checker. My concern with that is in that case we would have the argument constraining part in checkPostCall still in this checker, because that is part of the modelling. And actually it makes sense to apply the argument constraints only if we know for sure that they are not violated. The violation then would be checked in the new checker, this seems a bit awkward to me. Because checking the violation of the constraints and applying the constraints seems to be a cohesive action to me. I mean it would not even make sense to turn off the warning checker, because then we'd be applying the constraints blindly.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
704–706 | Yes this is a good idea. I will do this. What bothers me really much, however, is that we should handle EOF in a platform dependent way as well ... and I have absolutely no idea how to do that given that is defined by a macro in a platform specific header file. I am desperately in need for help and ideas about how could we get the value of EOF for the analysed platform. |
What I mean by that is that we must do over-approximation if the argument is symbolic. I.e. we presume that the constraints do hold otherwise the program would be ill-formed and there is no point to continue the analysis on this path. It is very similar to what we do in case of the DivZero or the NullDeref Checkers: if there is no violation (no warning) and the variable is symbolic then we constrain the value by the condition. E.g. in DivZero::checkPreStmt we have:
// If we get here, then the denom should not be zero. We abandon the implicit // zero denom case for now. C.addTransition(stateNotZero);
Strictly speaking, these transitions should be part of the modeling then in this sense (and they should be in PostStmt?). Still they are not separated into a different checker.
What I mean by that is that we must do over-approximation if the argument is symbolic. I.e. we presume that the constraints do hold otherwise the program would be ill-formed and there is no point to continue the analysis on this path.
Sorry, that's actually under-approximation because we elide paths.
Based on our verbal discussion with @Szelethus and @steakhal and based on the mailing archives, I am going to do the following changes:
- Add a new checker that is implemented in the StdLibraryFunctionsChecker class.
- This new checker if switched on is responsible for emitting the warning. Even if this is turned off, the sink node is generated if the argument violates the given condition.
- This means, the new checker has the sole responsibility of emitting the warning, but nothing more.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
442 | This check works now with concrete int values. We have a known value and a list of ranges with known limits, so testing for in any of the ranges does work the same way as testing for out of all ranges. And testing if the value is inside one of the ranges is more simple code. But I think the symbolic evaluation with "eval" and "assume" functions would be more generic here (and more simple code). Then the way of cutting-of the bad ranges is usable (probably still there is other solution). | |
503 | If evalCall is used it could be more simple to test and apply the constraints for arguments and return value in a "single step". |
Probably a better solution can be:
For every "case" build a single SVal that contains all argument constraints for that case. It is possible using multiple evalBinOp calls (with <=, >=, logical or) to build such a condition (or repeated calls to other assume functions to cut off outer ranges). If the condition can be satisfied (by assume) add the new state, the condition for return value can be added here too. Repeat this for every different case. If no applicable case is found none of the conditions can be assumed, this means argument constraint error.
- Add new Checker that does the report
- Refactor with negated RangeValues
- Add overload to findFunctionSummary
- Add tests for symbolic values
- Add test file for bug path
I've done a major refactor with the handling of argument constraints. I am now reusing ValueRange::apply in checkPreCall on "negated" value ranges to check if the constraint is violated.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
452 | What about warning messages with placeholders? E.g. "Argument constraint of {nth} argument of {fun} is not satisfied. The value is {v}, however it should be in {range}." No matter how it turns out, this should be handled in a different patch. | |
455 | Ok, I added a separate test file where the tests focus on the bug path. |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
441 | StringRef |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
461 | Is this addTransition needed? It would be OK to call generateErrorNode with State. Even if not, adding the transition before should not be needed? | |
719–720 | Why is this {128, UCharMax} here and at the next entry needed? | |
728 | Is this ArgConstraint intentionally added only to isalnum? |
- Use StringRef for Msg
- Remove superfluous addTransition
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
461 | Yes, you are right it is superfluous, I removed it. |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
719–720 | This is the local specific range , [128, 255]. There are characters like ä which we don't know if they are treated as an alphanumerical character or not. We can't really tell how a specific libc implementation classifies them. On the other hand, with English letters we can state the classes confidently. | |
728 | Yes, I wanted to create first the infrastructure and then later to add all these constraints to the rest of the summaries with new tests. |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
165 | This default branch is not needed here (actually gives a compiler warning too). |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
704–706 | If the EOF is not used in the TU analyzed, then there would be no way to find the specific #define. |
It may be useful to make a "macro value map" kind of object. Some macros can be added to it as a string, and it is possible to lookup for an Expr if one of the added macros is used there. This can be done by checking the concrete (numeric) value of the Expr and compare to the value of the macro, or by checking if the expression comes from a macro and take this macro name (use string comparison). Such an object can be useful because the functionality is needed at more checkers, for example the ones I am working on (StreamChecker and ErrorReturnChecker too).
something like this:
class MacroUsageDetector { public: void addMacroName(StringRef MName); bool isMacroUsed(StringRef MName, Expr *E, ???); APSInt getMacroValue(StringRef MName); };
Or one that handles a single macro?
The high level idea and the implementation of the checker seems great. In general, things that you want to address in later patches should be stated in the code with a TODO. I wrote a couple nits that I don't want to delete, but maybe it'd be better to address them after the dependency patch is agreed upon.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td | ||
---|---|---|
299 | How about we add an example as well? | |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
9–10 | I suspect this comment is no longer relevant. | |
156 | Maybe complement would be a better name? That sounds a lot more like a set operation. Also, this function highlights well that inheritance might not be the best solution here. | |
197 | I think that is a rather poor example to help understand what list of list of ranges means :) -- Could you try to find something better? | |
437–445 | While I find your usage of lambdas fascinating, this one seems a bit unnecessary :) | |
440 | That is a TODO, rather :^) | |
clang/test/Analysis/std-c-library-functions-arg-constraints.c | ||
2–8 | Hmm, why do we have 2 different test files that essentially do the same? Shouldn't we only have a single one with analyzer-output=text? | |
clang/test/Analysis/std-c-library-functions.c | ||
1–32 | What a beautiful sight. Thanks. |
Is it sure that the signedness in the ranges is handled correctly? The EOF is a negative value but the RangeInt is unsigned type. The tryExpandAsInteger returns int too that is put into an unsigned RangeInt later. Probably it is better to use APSInt for the ranges? (The problem exists already before this change.)
Please see my previous answer to @gamesh411
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td | ||
---|---|---|
299 | You mean like NonNull or other constraints? | |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
9–10 | Uh, yes. | |
156 | Well, we check the argument constraint validity by trying to apply it's logical negation. In case of a range inclusion this is being out of that range. In case of non-null this is being null. And so on. The logic how we try to check an argument constraint is the same in all cases of the different constraints. And that is the point: in order to support a new kind of constraint we just have to figure out how to "apply" and "negate" one constraint. In my opinion this is a perfect case for polimorphism. | |
197 | Yeah, that part definitely should be reworded. | |
704–706 | I believe that the given standard C lib implementation (e.g. glibc) must provide a header for the prototypes of these functions where EOF is also defined transitively in any of the dependent system headers. Otherwise user code could misuse the value of EOF and thus the program would behave in an undefined manner. C99 clearly states that you should #include <ctype.h> to use isalhpa. | |
clang/test/Analysis/std-c-library-functions-arg-constraints.c | ||
2–8 | No, I wanted to have two different test files to test two different things: (1) We do have the constraints applied (here we don't care about the warnings and the path) | |
clang/test/Analysis/std-c-library-functions.c | ||
1–32 | Anytime :D |
That is not a problem, because finally in apply we use an APSInt that is constructed by considering the correct T type, e.g.:
const llvm::APSInt &Min = BVF.getValue(R[I].first, T);
We could consider RangeInt as a buffer that is big enough to hold the representation of the range values. The concrete interpretation of the bits (as T) is done by APSInt.
Just littering some more inlines, don't mind me :) Lets still wait on the dependency patch before updating.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td | ||
---|---|---|
299 | Like Check constraints of arguments of C standard library functions, such as whether the parameter of isalpha is in the range [0, 255] or is EOF. | |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
91–92 | How about ValueConstraintRef? | |
156 | We agreed on inheritance in the previous patch, and regarding the name, sure, leave it as-is. :) |
Looks great as long as other reviewers are happy, thanks!
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
456–458 | Maybe we should add an assertion that the same argument isn't specified multiple times. |
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td | ||
---|---|---|
299 | Ok, done. | |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
91–92 | Yeah, we have ProgramStateRef and SymbolRef. And both are actually just synonyms to smart pointers. I'd rather not call a pointer as a reference, because that can be confusing when reading the code. E.g. when I see that we return with a nullptr from a function that can return with a ...Ref I start to scratch my head. | |
197 | I added an example with isalpha. | |
437–445 | Ok I moved it to be a member function named ReportBug. | |
456–458 | I think there could be cases when we want to have e.g. a not-null constraint on the 1st argument, but also we want to express that the 1st argument's size is described by the 2nd argument. I am planning to implement such a constraints in the future. In that case we would have two constraints on the 1st argument and the assert would fire. |
- tmp -> Tmp
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
165 |
I am not sure why I thought that that's not needed, actually we need that. (Perhaps an intermediate version returned in each cases.) |
LGTM, aside from some checker tagging nightmare. Its a bit easy to mess up, please allow me to have a final look before commiting! :)
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td | ||
---|---|---|
298 | Just noticed, this checker still lies in the apiModeling package. Could we find a more appropriate place? | |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
91–92 | Sure, I'm sold. | |
264 | By passing this, the error message will be tied to the modeling checker, not to the one you just added. BugType has a constructor that accepts a string instead, pass CheckNames[CK_StdCLibraryFunctionArgsChecker] in there :) Also, how about BT_InvalidArgument or something? |
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td | ||
---|---|---|
298 | Technically speaking this is still api modeling. In midterm we'd like to add support for more libc functions, gnu and posix functions, they are all library functions i.e. they provide some api. | |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
112 | Not all of the constraint classes must implement this. Right now, e.g. the ComparisonConstraint does not implement this, because there is no such summary (yet) that uses a ComparisonConstraint as an argument constraint. | |
197 | Not exactly. A branch represents a path in the exploded graph of a function (which is a tree). | |
264 | Thanks, good catch, I did not know about that. Please note that using CheckNames requires that we change the BT member to be lazily initialized. Because CheckNames is initialized only after the checker itself is created, thus we cannot initialize BT during the checkers construction, b/c that would be before CheckNames is set. So, I changed BT to be a unique_ptr and it is being lazily initialized in reportBug. | |
301 | Yeah, can't get used to this strange naming convention that LLVM uses. Fixed it. |
- Add comments about what is a branch
- Do not use 'this' for BugType
- Lazily init BT and BT -> BT_InvalidArg
- ReportBug -> reportBug
Whoo! The patch looks great and well thought out, the tests look like they cover everything and we also talked about plans for future patches. Excellent!
I left a nit about merging the test files, but I'll leave it up to you to address or ignore it.
clang/test/Analysis/std-c-library-functions-arg-constraints.c | ||
---|---|---|
2–8 | What if we had different -verifys? clang/test/Analysis/track-conditions.cpp is a great example. |
Thanks for the review guys!
clang/test/Analysis/std-c-library-functions-arg-constraints.c | ||
---|---|---|
2–8 | Yeah, that's a very good approach, I just changed it like that. :) |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
456–458 | Wait, i misunderstood the code. It's even worse than that: you're adding transitions in a loop, so it'll cause state splits for every constraint. Because you do not intend to create multiple branches here, there needs to be exactly one addTransition performed every time checkPreCall is called. I.e., for now this code is breaking everything whenever there's more than one constraint, regardless of whether it's on the same argument. |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
456–458 | Yeah, that's a very good catch, thanks! I am going to prepare a patch to fix this soon. My idea is to store the SuccessSt and apply the next argument constraint on that. And once the loop is finished I'll have call the addTransition(). |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
456–458 | Yup, that's the common thing to do in such cases. |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
456–458 | While we're at it, could you try to come up with a runtime assertion that'll help us prevent these mistakes? Like, dunno, make CheckerContext crash whenever there's more than one branch being added, and then add a method to opt out when it's actually necessary to add more transitions (i.e., the user would say C.setMaxTransitions(2) at the beginning of their checker callback whenever they need to make a state split, defaulting to 1). It's a bit tricky because i still want to allow multiple transitions when they allow one branch (i.e., transitions chained together) but i think it'll take a lot of review anxiety from me because it's a very dangerous mistake to make and for now code review is the only way to catch it. So, yay, faster code reviews. |
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp | ||
---|---|---|
456–458 | Hmm I see your point and I agree this would be a valuable sanity check. But if you don't mind I'd like to address this in a different and stand-alone patch (independently from the quick-fix https://reviews.llvm.org/D76790) because it does not seem to be trivial for me. My first concern is this: if we have 1 as the default value for maxTranisitions then we should add an extra C.setMaxTransitions(N) in every checker callback that does a state split, is that right? |
Just noticed, this checker still lies in the apiModeling package. Could we find a more appropriate place?