Page MenuHomePhabricator

[analyzer] StdLibraryFunctionsChecker: Add argument constraints
ClosedPublic

Authored by martong on Feb 3 2020, 8:18 AM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
martong added inline comments.Feb 24 2020, 6:12 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
719–720

This is the local specific range , [128, 255]. There are characters like ä which we don't know if they are treated as an alphanumerical character or not. We can't really tell how a specific libc implementation classifies them. On the other hand, with English letters we can state the classes confidently.

728

Yes, I wanted to create first the infrastructure and then later to add all these constraints to the rest of the summaries with new tests.

martong marked an inline comment as done.Feb 24 2020, 10:00 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
165

This default branch is not needed here (actually gives a compiler warning too).

gamesh411 added inline comments.Feb 27 2020, 1:12 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
704–706

If the EOF is not used in the TU analyzed, then there would be no way to find the specific #define.
Another approach would be to check if the value is defined by an expression that is the EOF define (maybe transitively?).

It may be useful to make a "macro value map" kind of object. Some macros can be added to it as a string, and it is possible to lookup for an Expr if one of the added macros is used there. This can be done by checking the concrete (numeric) value of the Expr and compare to the value of the macro, or by checking if the expression comes from a macro and take this macro name (use string comparison). Such an object can be useful because the functionality is needed at more checkers, for example the ones I am working on (StreamChecker and ErrorReturnChecker too).

something like this:

class MacroUsageDetector {
public:
  void addMacroName(StringRef MName);
  bool isMacroUsed(StringRef MName, Expr *E, ???);
  APSInt getMacroValue(StringRef MName);
};

Or one that handles a single macro?

The high level idea and the implementation of the checker seems great. In general, things that you want to address in later patches should be stated in the code with a TODO. I wrote a couple nits that I don't want to delete, but maybe it'd be better to address them after the dependency patch is agreed upon.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
299

How about we add an example as well?

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
9–10

I suspect this comment is no longer relevant.

156

Maybe complement would be a better name? That sounds a lot more like a set operation. Also, this function highlights well that inheritance might not be the best solution here.

197

I think that is a rather poor example to help understand what list of list of ranges means :) -- Could you try to find something better?

437–445

While I find your usage of lambdas fascinating, this one seems a bit unnecessary :)

440

That is a TODO, rather :^)

clang/test/Analysis/std-c-library-functions-arg-constraints.c
2–8

Hmm, why do we have 2 different test files that essentially do the same? Shouldn't we only have a single one with analyzer-output=text?

clang/test/Analysis/std-c-library-functions.c
1–32

What a beautiful sight. Thanks.

balazske added a comment.EditedMar 2 2020, 8:09 AM

Is it sure that the signedness in the ranges is handled correctly? The EOF is a negative value but the RangeInt is unsigned type. The tryExpandAsInteger returns int too that is put into an unsigned RangeInt later. Probably it is better to use APSInt for the ranges? (The problem exists already before this change.)

martong marked 9 inline comments as done.Mar 6 2020, 9:51 AM

It may be useful to make a "macro value map" kind of object. Some macros can be added to it as a string, and it is possible to lookup for an Expr if one of the added macros is used there. This can be done by checking the concrete (numeric) value of the Expr and compare to the value of the macro, or by checking if the expression comes from a macro and take this macro name (use string comparison). Such an object can be useful because the functionality is needed at more checkers, for example the ones I am working on (StreamChecker and ErrorReturnChecker too).

Please see my previous answer to @gamesh411

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
299

You mean like NonNull or other constraints?

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
9–10

Uh, yes.

156

Well, we check the argument constraint validity by trying to apply it's logical negation. In case of a range inclusion this is being out of that range. In case of non-null this is being null. And so on. The logic how we try to check an argument constraint is the same in all cases of the different constraints. And that is the point: in order to support a new kind of constraint we just have to figure out how to "apply" and "negate" one constraint. In my opinion this is a perfect case for polimorphism.

197

Yeah, that part definitely should be reworded.

704–706

I believe that the given standard C lib implementation (e.g. glibc) must provide a header for the prototypes of these functions where EOF is also defined transitively in any of the dependent system headers. Otherwise user code could misuse the value of EOF and thus the program would behave in an undefined manner.

C99 clearly states that you should #include <ctype.h> to use isalhpa.

clang/test/Analysis/std-c-library-functions-arg-constraints.c
2–8

No, I wanted to have two different test files to test two different things: (1) We do have the constraints applied (here we don't care about the warnings and the path)
(2) Check that we have a warning with the proper tracking and notes.

clang/test/Analysis/std-c-library-functions.c
1–32

Anytime :D

martong marked 2 inline comments as done.Mar 6 2020, 10:00 AM

Is it sure that the signedness in the ranges is handled correctly? The EOF is a negative value but the RangeInt is unsigned type. The tryExpandAsInteger returns int too that is put into an unsigned RangeInt later. Probably it is better to use APSInt for the ranges? (The problem exists already before this change.)

That is not a problem, because finally in apply we use an APSInt that is constructed by considering the correct T type, e.g.:

const llvm::APSInt &Min = BVF.getValue(R[I].first, T);

We could consider RangeInt as a buffer that is big enough to hold the representation of the range values. The concrete interpretation of the bits (as T) is done by APSInt.

Just littering some more inlines, don't mind me :) Lets still wait on the dependency patch before updating.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
299

Like

Check constraints of arguments of C standard library functions, such as whether the parameter of isalpha is in the range [0, 255] or is EOF.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
91–92

How about ValueConstraintRef?

156

We agreed on inheritance in the previous patch, and regarding the name, sure, leave it as-is. :)

NoQ accepted this revision.Mar 15 2020, 9:51 PM

Looks great as long as other reviewers are happy, thanks!

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Maybe we should add an assertion that the same argument isn't specified multiple times.

martong marked 16 inline comments as done.Mar 17 2020, 10:25 AM
martong added inline comments.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
299

Ok, done.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
91–92

Yeah, we have ProgramStateRef and SymbolRef. And both are actually just synonyms to smart pointers. I'd rather not call a pointer as a reference, because that can be confusing when reading the code. E.g. when I see that we return with a nullptr from a function that can return with a ...Ref I start to scratch my head.

197

I added an example with isalpha.

437–445

Ok I moved it to be a member function named ReportBug.

456–458

I think there could be cases when we want to have e.g. a not-null constraint on the 1st argument, but also we want to express that the 1st argument's size is described by the 2nd argument. I am planning to implement such a constraints in the future. In that case we would have two constraints on the 1st argument and the assert would fire.

martong updated this revision to Diff 250820.Mar 17 2020, 10:25 AM
martong marked 4 inline comments as done.
martong removed a subscriber: DenisDvlp.
  • Address review comments
martong updated this revision to Diff 250869.Mar 17 2020, 12:45 PM
martong marked an inline comment as done.
  • tmp -> Tmp
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
165

This default branch is not needed here (actually gives a compiler warning too).

I am not sure why I thought that that's not needed, actually we need that. (Perhaps an intermediate version returned in each cases.)

Szelethus requested changes to this revision.Mar 17 2020, 8:05 PM

LGTM, aside from some checker tagging nightmare. Its a bit easy to mess up, please allow me to have a final look before commiting! :)

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
298

Just noticed, this checker still lies in the apiModeling package. Could we find a more appropriate place?

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
91–92

Sure, I'm sold.

264

By passing this, the error message will be tied to the modeling checker, not to the one you just added. BugType has a constructor that accepts a string instead, pass CheckNames[CK_StdCLibraryFunctionArgsChecker] in there :) Also, how about BT_InvalidArgument or something?

This revision now requires changes to proceed.Mar 17 2020, 8:05 PM
balazske added inline comments.Mar 18 2020, 2:27 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
112

Is it better done with = 0?

197

The "branches" are the structures that define relations between arguments and return values? This could be included in the description.

301

This should be called reportBug.

martong marked 11 inline comments as done.Mar 18 2020, 7:46 AM
martong added inline comments.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
298

Technically speaking this is still api modeling. In midterm we'd like to add support for more libc functions, gnu and posix functions, they are all library functions i.e. they provide some api.
Of course in long term, we'd like to experiment by getting some constraints from IR/Attributor, but we are still far from there.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
112

Not all of the constraint classes must implement this. Right now, e.g. the ComparisonConstraint does not implement this, because there is no such summary (yet) that uses a ComparisonConstraint as an argument constraint.

197

Not exactly. A branch represents a path in the exploded graph of a function (which is a tree).
So, a branch is a series of assumptions. In other words, branches represent split states and additional assumptions on top of the splitting assumption. I added this explanation to the comments.

264

Thanks, good catch, I did not know about that. Please note that using CheckNames requires that we change the BT member to be lazily initialized. Because CheckNames is initialized only after the checker itself is created, thus we cannot initialize BT during the checkers construction, b/c that would be before CheckNames is set. So, I changed BT to be a unique_ptr and it is being lazily initialized in reportBug.

301

Yeah, can't get used to this strange naming convention that LLVM uses. Fixed it.

martong updated this revision to Diff 251083.Mar 18 2020, 7:46 AM
martong marked 5 inline comments as done.
  • Add comments about what is a branch
  • Do not use 'this' for BugType
  • Lazily init BT and BT -> BT_InvalidArg
  • ReportBug -> reportBug
Szelethus accepted this revision.Mar 19 2020, 12:52 PM

Whoo! The patch looks great and well thought out, the tests look like they cover everything and we also talked about plans for future patches. Excellent!

I left a nit about merging the test files, but I'll leave it up to you to address or ignore it.

clang/test/Analysis/std-c-library-functions-arg-constraints.c
2–8

What if we had different -verifys? clang/test/Analysis/track-conditions.cpp is a great example.

This revision is now accepted and ready to land.Mar 19 2020, 12:52 PM
martong updated this revision to Diff 251648.Mar 20 2020, 8:18 AM
martong marked an inline comment as done.
  • Use prefixes for -verify to check different things in the same test file
martong marked an inline comment as done.Mar 20 2020, 8:19 AM

Thanks for the review guys!

clang/test/Analysis/std-c-library-functions-arg-constraints.c
2–8

Yeah, that's a very good approach, I just changed it like that. :)

This revision was automatically updated to reflect the committed changes.
NoQ added inline comments.Mar 25 2020, 12:06 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Wait, i misunderstood the code. It's even worse than that: you're adding transitions in a loop, so it'll cause state splits for every constraint. Because you do not intend to create multiple branches here, there needs to be exactly one addTransition performed every time checkPreCall is called. I.e., for now this code is breaking everything whenever there's more than one constraint, regardless of whether it's on the same argument.

martong marked an inline comment as done.Mar 25 2020, 7:33 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Yeah, that's a very good catch, thanks! I am going to prepare a patch to fix this soon. My idea is to store the SuccessSt and apply the next argument constraint on that. And once the loop is finished I'll have call the addTransition().

NoQ added inline comments.Mar 25 2020, 8:15 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Yup, that's the common thing to do in such cases.

NoQ added inline comments.Mar 25 2020, 9:29 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

While we're at it, could you try to come up with a runtime assertion that'll help us prevent these mistakes?

Like, dunno, make CheckerContext crash whenever there's more than one branch being added, and then add a method to opt out when it's actually necessary to add more transitions (i.e., the user would say C.setMaxTransitions(2) at the beginning of their checker callback whenever they need to make a state split, defaulting to 1). It's a bit tricky because i still want to allow multiple transitions when they allow one branch (i.e., transitions chained together) but i think it'll take a lot of review anxiety from me because it's a very dangerous mistake to make and for now code review is the only way to catch it. So, yay, faster code reviews.

I just created a quick fix for the issue: https://reviews.llvm.org/D76790

martong marked an inline comment as done.Mar 25 2020, 11:23 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Hmm I see your point and I agree this would be a valuable sanity check. But if you don't mind I'd like to address this in a different and stand-alone patch (independently from the quick-fix https://reviews.llvm.org/D76790) because it does not seem to be trivial for me.

My first concern is this: if we have 1 as the default value for maxTranisitions then we should add an extra C.setMaxTransitions(N) in every checker callback that does a state split, is that right?