This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] StdLibraryFunctionsChecker: Add argument constraints
ClosedPublic

Authored by martong on Feb 3 2020, 8:18 AM.

Diff Detail

Event Timeline

martong created this revision.Feb 3 2020, 8:18 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 3 2020, 8:18 AM

Unit tests: pass. 62418 tests passed, 0 failed and 845 were skipped.

clang-tidy: pass.

clang-format: pass.

Build artifacts: diff.json, clang-tidy.txt, clang-format.patch, CMakeCache.txt, console-log.txt, test-results.xml

Pre-merge checks is in beta. Report issue. Please join beta or enable it for your project.

NoQ added inline comments.Feb 3 2020, 9:56 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
452

Yeah, a must-have for this check to be enabled by default would be to be able to provide a specific warning message for every function. I guess we could include them in the summaries as an extra argument of ArgConstraint.

455

Let's test our notes.

That'll be especially important when we get to non-concrete values, because the visitor might need to be expanded (or we might need a completely new visitor).

xazax.hun added inline comments.Feb 6 2020, 3:39 PM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
442

Dealing with only concrete ints might be a good start but we might want to handle symbolic cases in the future like:

if (v > 255) 
  return isalpha(v);

I am ok with not addig this in the first version but adding TODOs and test cases upfront cannot hurt.

So basivally, I was wondering if we should query the solver for the result instead of matching the sval kind and just early return if we do not want to support a specific kind.

Szelethus requested changes to this revision.Feb 7 2020, 4:40 AM

I wouldn't like to see reports emitted by a checker that resides in apiModeling. Could we create a new one? Some checkers, like the IteratorChecker, MallocChecker and CStringChecker implement a variety of user-facing checkers within the same class, that is also an option, if creating a new checker class is too much of a hassle.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
704–706

This is true for the rest of the summaries as well, but shouldn't we retrieve the unsigned char size from ASTContext?

This revision now requires changes to proceed.Feb 7 2020, 4:40 AM

I wouldn't like to see reports emitted by a checker that resides in apiModeling. Could we create a new one? Some checkers, like the IteratorChecker, MallocChecker and CStringChecker implement a variety of user-facing checkers within the same class, that is also an option, if creating a new checker class is too much of a hassle.

Yes, we could split the warning emitting part to a new checker. My concern with that is in that case we would have the argument constraining part in checkPostCall still in this checker, because that is part of the modelling. And actually it makes sense to apply the argument constraints only if we know for sure that they are not violated. The violation then would be checked in the new checker, this seems a bit awkward to me. Because checking the violation of the constraints and applying the constraints seems to be a cohesive action to me. I mean it would not even make sense to turn off the warning checker, because then we'd be applying the constraints blindly.

martong marked an inline comment as done.Feb 7 2020, 8:24 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
704–706

Yes this is a good idea. I will do this.

What bothers me really much, however, is that we should handle EOF in a platform dependent way as well ... and I have absolutely no idea how to do that given that is defined by a macro in a platform specific header file. I am desperately in need for help and ideas about how could we get the value of EOF for the analysed platform.

I wouldn't like to see reports emitted by a checker that resides in apiModeling. Could we create a new one? Some checkers, like the IteratorChecker, MallocChecker and CStringChecker implement a variety of user-facing checkers within the same class, that is also an option, if creating a new checker class is too much of a hassle.

... And actually it makes sense to apply the argument constraints only if we know for sure that they are not violated. ...

What I mean by that is that we must do over-approximation if the argument is symbolic. I.e. we presume that the constraints do hold otherwise the program would be ill-formed and there is no point to continue the analysis on this path. It is very similar to what we do in case of the DivZero or the NullDeref Checkers: if there is no violation (no warning) and the variable is symbolic then we constrain the value by the condition. E.g. in DivZero::checkPreStmt we have:

// If we get here, then the denom should not be zero. We abandon the implicit
// zero denom case for now.
C.addTransition(stateNotZero);

Strictly speaking, these transitions should be part of the modeling then in this sense (and they should be in PostStmt?). Still they are not separated into a different checker.

What I mean by that is that we must do over-approximation if the argument is symbolic. I.e. we presume that the constraints do hold otherwise the program would be ill-formed and there is no point to continue the analysis on this path.

Sorry, that's actually under-approximation because we elide paths.

martong planned changes to this revision.Feb 10 2020, 5:33 AM
martong added a subscriber: steakhal.

Based on our verbal discussion with @Szelethus and @steakhal and based on the mailing archives, I am going to do the following changes:

  • Add a new checker that is implemented in the StdLibraryFunctionsChecker class.
  • This new checker if switched on is responsible for emitting the warning. Even if this is turned off, the sink node is generated if the argument violates the given condition.
  • This means, the new checker has the sole responsibility of emitting the warning, but nothing more.
martong updated this revision to Diff 244430.Feb 13 2020, 7:45 AM
  • Rebase to master
balazske added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
442

This check works now with concrete int values. We have a known value and a list of ranges with known limits, so testing for in any of the ranges does work the same way as testing for out of all ranges. And testing if the value is inside one of the ranges is more simple code.

But I think the symbolic evaluation with "eval" and "assume" functions would be more generic here (and more simple code). Then the way of cutting-of the bad ranges is usable (probably still there is other solution).

503

If evalCall is used it could be more simple to test and apply the constraints for arguments and return value in a "single step".

Probably a better solution can be:
For every "case" build a single SVal that contains all argument constraints for that case. It is possible using multiple evalBinOp calls (with <=, >=, logical or) to build such a condition (or repeated calls to other assume functions to cut off outer ranges). If the condition can be satisfied (by assume) add the new state, the condition for return value can be added here too. Repeat this for every different case. If no applicable case is found none of the conditions can be assumed, this means argument constraint error.

martong updated this revision to Diff 245651.Feb 20 2020, 7:05 AM
martong marked 8 inline comments as done.
  • Add new Checker that does the report
  • Refactor with negated RangeValues
  • Add overload to findFunctionSummary
  • Add tests for symbolic values
  • Add test file for bug path

I've done a major refactor with the handling of argument constraints. I am now reusing ValueRange::apply in checkPreCall on "negated" value ranges to check if the constraint is violated.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
452

What about warning messages with placeholders? E.g. "Argument constraint of {nth} argument of {fun} is not satisfied. The value is {v}, however it should be in {range}."
There will be a bunch of functions whose warning message template would be the same. On the other hand some others could have different warnings, and that justifies the need for specialized warnings.
Still, I think the warning message in the summary should be optional, because otherwise it would be really hard to automatically add summaries from other sources (like from cppcheck).

No matter how it turns out, this should be handled in a different patch.

455

Ok, I added a separate test file where the tests focus on the bug path.

martong updated this revision to Diff 245652.Feb 20 2020, 7:10 AM
  • Remove leftover call from test
steakhal added inline comments.Feb 21 2020, 4:05 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
441

StringRef

balazske added inline comments.Feb 23 2020, 11:44 PM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
461

Is this addTransition needed? It would be OK to call generateErrorNode with State. Even if not, adding the transition before should not be needed?

719–720

Why is this {128, UCharMax} here and at the next entry needed?

728

Is this ArgConstraint intentionally added only to isalnum?

martong updated this revision to Diff 246193.Feb 24 2020, 6:12 AM
martong marked 7 inline comments as done.
  • Use StringRef for Msg
  • Remove superfluous addTransition
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
461

Yes, you are right it is superfluous, I removed it.

martong added inline comments.Feb 24 2020, 6:12 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
719–720

This is the local specific range , [128, 255]. There are characters like ä which we don't know if they are treated as an alphanumerical character or not. We can't really tell how a specific libc implementation classifies them. On the other hand, with English letters we can state the classes confidently.

728

Yes, I wanted to create first the infrastructure and then later to add all these constraints to the rest of the summaries with new tests.

martong marked an inline comment as done.Feb 24 2020, 10:00 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
165

This default branch is not needed here (actually gives a compiler warning too).

gamesh411 added inline comments.Feb 27 2020, 1:12 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
704–706

If the EOF is not used in the TU analyzed, then there would be no way to find the specific #define.
Another approach would be to check if the value is defined by an expression that is the EOF define (maybe transitively?).

It may be useful to make a "macro value map" kind of object. Some macros can be added to it as a string, and it is possible to lookup for an Expr if one of the added macros is used there. This can be done by checking the concrete (numeric) value of the Expr and compare to the value of the macro, or by checking if the expression comes from a macro and take this macro name (use string comparison). Such an object can be useful because the functionality is needed at more checkers, for example the ones I am working on (StreamChecker and ErrorReturnChecker too).

something like this:

class MacroUsageDetector {
public:
  void addMacroName(StringRef MName);
  bool isMacroUsed(StringRef MName, Expr *E, ???);
  APSInt getMacroValue(StringRef MName);
};

Or one that handles a single macro?

The high level idea and the implementation of the checker seems great. In general, things that you want to address in later patches should be stated in the code with a TODO. I wrote a couple nits that I don't want to delete, but maybe it'd be better to address them after the dependency patch is agreed upon.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
299

How about we add an example as well?

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
9–10

I suspect this comment is no longer relevant.

156

Maybe complement would be a better name? That sounds a lot more like a set operation. Also, this function highlights well that inheritance might not be the best solution here.

197

I think that is a rather poor example to help understand what list of list of ranges means :) -- Could you try to find something better?

437–445

While I find your usage of lambdas fascinating, this one seems a bit unnecessary :)

440

That is a TODO, rather :^)

clang/test/Analysis/std-c-library-functions-arg-constraints.c
2–8

Hmm, why do we have 2 different test files that essentially do the same? Shouldn't we only have a single one with analyzer-output=text?

clang/test/Analysis/std-c-library-functions.c
1–32

What a beautiful sight. Thanks.

balazske added a comment.EditedMar 2 2020, 8:09 AM

Is it sure that the signedness in the ranges is handled correctly? The EOF is a negative value but the RangeInt is unsigned type. The tryExpandAsInteger returns int too that is put into an unsigned RangeInt later. Probably it is better to use APSInt for the ranges? (The problem exists already before this change.)

martong marked 9 inline comments as done.Mar 6 2020, 9:51 AM

It may be useful to make a "macro value map" kind of object. Some macros can be added to it as a string, and it is possible to lookup for an Expr if one of the added macros is used there. This can be done by checking the concrete (numeric) value of the Expr and compare to the value of the macro, or by checking if the expression comes from a macro and take this macro name (use string comparison). Such an object can be useful because the functionality is needed at more checkers, for example the ones I am working on (StreamChecker and ErrorReturnChecker too).

Please see my previous answer to @gamesh411

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
299

You mean like NonNull or other constraints?

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
9–10

Uh, yes.

156

Well, we check the argument constraint validity by trying to apply it's logical negation. In case of a range inclusion this is being out of that range. In case of non-null this is being null. And so on. The logic how we try to check an argument constraint is the same in all cases of the different constraints. And that is the point: in order to support a new kind of constraint we just have to figure out how to "apply" and "negate" one constraint. In my opinion this is a perfect case for polimorphism.

197

Yeah, that part definitely should be reworded.

704–706

I believe that the given standard C lib implementation (e.g. glibc) must provide a header for the prototypes of these functions where EOF is also defined transitively in any of the dependent system headers. Otherwise user code could misuse the value of EOF and thus the program would behave in an undefined manner.

C99 clearly states that you should #include <ctype.h> to use isalhpa.

clang/test/Analysis/std-c-library-functions-arg-constraints.c
2–8

No, I wanted to have two different test files to test two different things: (1) We do have the constraints applied (here we don't care about the warnings and the path)
(2) Check that we have a warning with the proper tracking and notes.

clang/test/Analysis/std-c-library-functions.c
1–32

Anytime :D

martong marked 2 inline comments as done.Mar 6 2020, 10:00 AM

Is it sure that the signedness in the ranges is handled correctly? The EOF is a negative value but the RangeInt is unsigned type. The tryExpandAsInteger returns int too that is put into an unsigned RangeInt later. Probably it is better to use APSInt for the ranges? (The problem exists already before this change.)

That is not a problem, because finally in apply we use an APSInt that is constructed by considering the correct T type, e.g.:

const llvm::APSInt &Min = BVF.getValue(R[I].first, T);

We could consider RangeInt as a buffer that is big enough to hold the representation of the range values. The concrete interpretation of the bits (as T) is done by APSInt.

Just littering some more inlines, don't mind me :) Lets still wait on the dependency patch before updating.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
299

Like

Check constraints of arguments of C standard library functions, such as whether the parameter of isalpha is in the range [0, 255] or is EOF.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
91–92

How about ValueConstraintRef?

156

We agreed on inheritance in the previous patch, and regarding the name, sure, leave it as-is. :)

NoQ accepted this revision.Mar 15 2020, 9:51 PM

Looks great as long as other reviewers are happy, thanks!

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Maybe we should add an assertion that the same argument isn't specified multiple times.

martong marked 16 inline comments as done.Mar 17 2020, 10:25 AM
martong added inline comments.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
299

Ok, done.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
91–92

Yeah, we have ProgramStateRef and SymbolRef. And both are actually just synonyms to smart pointers. I'd rather not call a pointer as a reference, because that can be confusing when reading the code. E.g. when I see that we return with a nullptr from a function that can return with a ...Ref I start to scratch my head.

197

I added an example with isalpha.

437–445

Ok I moved it to be a member function named ReportBug.

456–458

I think there could be cases when we want to have e.g. a not-null constraint on the 1st argument, but also we want to express that the 1st argument's size is described by the 2nd argument. I am planning to implement such a constraints in the future. In that case we would have two constraints on the 1st argument and the assert would fire.

martong updated this revision to Diff 250820.Mar 17 2020, 10:25 AM
martong marked 4 inline comments as done.
martong removed a subscriber: DenisDvlp.
  • Address review comments
martong updated this revision to Diff 250869.Mar 17 2020, 12:45 PM
martong marked an inline comment as done.
  • tmp -> Tmp
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
165

This default branch is not needed here (actually gives a compiler warning too).

I am not sure why I thought that that's not needed, actually we need that. (Perhaps an intermediate version returned in each cases.)

Szelethus requested changes to this revision.Mar 17 2020, 8:05 PM

LGTM, aside from some checker tagging nightmare. Its a bit easy to mess up, please allow me to have a final look before commiting! :)

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
298

Just noticed, this checker still lies in the apiModeling package. Could we find a more appropriate place?

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
91–92

Sure, I'm sold.

264

By passing this, the error message will be tied to the modeling checker, not to the one you just added. BugType has a constructor that accepts a string instead, pass CheckNames[CK_StdCLibraryFunctionArgsChecker] in there :) Also, how about BT_InvalidArgument or something?

This revision now requires changes to proceed.Mar 17 2020, 8:05 PM
balazske added inline comments.Mar 18 2020, 2:27 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
112

Is it better done with = 0?

197

The "branches" are the structures that define relations between arguments and return values? This could be included in the description.

301

This should be called reportBug.

martong marked 11 inline comments as done.Mar 18 2020, 7:46 AM
martong added inline comments.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
298

Technically speaking this is still api modeling. In midterm we'd like to add support for more libc functions, gnu and posix functions, they are all library functions i.e. they provide some api.
Of course in long term, we'd like to experiment by getting some constraints from IR/Attributor, but we are still far from there.

clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
112

Not all of the constraint classes must implement this. Right now, e.g. the ComparisonConstraint does not implement this, because there is no such summary (yet) that uses a ComparisonConstraint as an argument constraint.

197

Not exactly. A branch represents a path in the exploded graph of a function (which is a tree).
So, a branch is a series of assumptions. In other words, branches represent split states and additional assumptions on top of the splitting assumption. I added this explanation to the comments.

264

Thanks, good catch, I did not know about that. Please note that using CheckNames requires that we change the BT member to be lazily initialized. Because CheckNames is initialized only after the checker itself is created, thus we cannot initialize BT during the checkers construction, b/c that would be before CheckNames is set. So, I changed BT to be a unique_ptr and it is being lazily initialized in reportBug.

301

Yeah, can't get used to this strange naming convention that LLVM uses. Fixed it.

martong updated this revision to Diff 251083.Mar 18 2020, 7:46 AM
martong marked 5 inline comments as done.
  • Add comments about what is a branch
  • Do not use 'this' for BugType
  • Lazily init BT and BT -> BT_InvalidArg
  • ReportBug -> reportBug
Szelethus accepted this revision.Mar 19 2020, 12:52 PM

Whoo! The patch looks great and well thought out, the tests look like they cover everything and we also talked about plans for future patches. Excellent!

I left a nit about merging the test files, but I'll leave it up to you to address or ignore it.

clang/test/Analysis/std-c-library-functions-arg-constraints.c
2–8

What if we had different -verifys? clang/test/Analysis/track-conditions.cpp is a great example.

This revision is now accepted and ready to land.Mar 19 2020, 12:52 PM
martong updated this revision to Diff 251648.Mar 20 2020, 8:18 AM
martong marked an inline comment as done.
  • Use prefixes for -verify to check different things in the same test file
martong marked an inline comment as done.Mar 20 2020, 8:19 AM

Thanks for the review guys!

clang/test/Analysis/std-c-library-functions-arg-constraints.c
2–8

Yeah, that's a very good approach, I just changed it like that. :)

This revision was automatically updated to reflect the committed changes.
NoQ added inline comments.Mar 25 2020, 12:06 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Wait, i misunderstood the code. It's even worse than that: you're adding transitions in a loop, so it'll cause state splits for every constraint. Because you do not intend to create multiple branches here, there needs to be exactly one addTransition performed every time checkPreCall is called. I.e., for now this code is breaking everything whenever there's more than one constraint, regardless of whether it's on the same argument.

martong marked an inline comment as done.Mar 25 2020, 7:33 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Yeah, that's a very good catch, thanks! I am going to prepare a patch to fix this soon. My idea is to store the SuccessSt and apply the next argument constraint on that. And once the loop is finished I'll have call the addTransition().

NoQ added inline comments.Mar 25 2020, 8:15 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Yup, that's the common thing to do in such cases.

NoQ added inline comments.Mar 25 2020, 9:29 AM
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

While we're at it, could you try to come up with a runtime assertion that'll help us prevent these mistakes?

Like, dunno, make CheckerContext crash whenever there's more than one branch being added, and then add a method to opt out when it's actually necessary to add more transitions (i.e., the user would say C.setMaxTransitions(2) at the beginning of their checker callback whenever they need to make a state split, defaulting to 1). It's a bit tricky because i still want to allow multiple transitions when they allow one branch (i.e., transitions chained together) but i think it'll take a lot of review anxiety from me because it's a very dangerous mistake to make and for now code review is the only way to catch it. So, yay, faster code reviews.

I just created a quick fix for the issue: https://reviews.llvm.org/D76790

martong marked an inline comment as done.Mar 25 2020, 11:23 AM
martong added inline comments.
clang/lib/StaticAnalyzer/Checkers/StdLibraryFunctionsChecker.cpp
456–458

Hmm I see your point and I agree this would be a valuable sanity check. But if you don't mind I'd like to address this in a different and stand-alone patch (independently from the quick-fix https://reviews.llvm.org/D76790) because it does not seem to be trivial for me.

My first concern is this: if we have 1 as the default value for maxTranisitions then we should add an extra C.setMaxTransitions(N) in every checker callback that does a state split, is that right?