This is an archive of the discontinued LLVM Phabricator instance.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
125	Again, you will have to highlight the allocation site with a note. Therefore you will have to write a bug visitor that traverses the size expression at some point (or, equivalently, a note tag when the size expression is evaluated). Therefore you don't need to store the expression in the program state.
clang/test/Analysis/cert/str31-alloc.cpp
43	The fix is not correct. It should be `sizeof(buf3) - 1`, otherwise you still have a buffer overflow.

NoQ added inline comments.Nov 4 2019, 11:53 AM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
185	Also, which is probably more important, you will never be able to provide a fixit for the malloced memory case, because there may be multiple execution paths that reach the current point with different size expressions (in fact, not necessarily all of them are malloced). Eg.: char *x = 0; char y[10]; if (coin()) { x = malloc(20); } else { x = y; } gets(x); If you suggest replacing `gets(x)` with `gets_s(x, 20)`, you'll still have a buffer overflow on the else-branch on which `x` points to an array of 10 bytes.

NoQ mentioned this in D69726: [analyzer] DynamicSize: Store the dynamic size.Nov 4 2019, 11:54 AM

Charusso added inline comments.Nov 4 2019, 12:32 PM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
125	Yes, you have pointed out the necessary visitor, but it needs more thinking. I have a memory region which could be any kind of "memory block region" therefore I have no idea where is the size expression. We are supporting ~20 different allocations, which is nothing compared to the wild with the not so uncommon 5+ parameter allocators. Therefore I still do not want to reverse engineer a small MallocChecker + ExprEngine + BuiltinFunctionChecker inside my checker. They provide the necessary `DynamicSizeInfo` easily, which could be used in at least 4 checkers at the moment (which I have pointed out earlier in D68725). If I have the size expression in the dynamic size map, and I can clearly point out the destination buffer, it is a lot more simplified to traverse the graph where the buffer and its size comes from.
185	This checker going to evolve a lot, and all of the checked function calls have issues like that. I do not even think what else issues they have. I would like to cover the false alarm suppression when we are about to alarm. Is it would be okay? I really would like to see alarms first. For example, I have seen stuff in the wild so that I can state out 8-param allocators and we need to rely on the checkers provide information about allocation.
clang/test/Analysis/cert/str31-alloc.cpp
43	Good catch, thanks! I was really into the pretty-printing, we should not fix-it.

Charusso added inline comments.Nov 4 2019, 2:21 PM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
125	Well, you really do not want to store `SizeExpr` of `malloc(SizeExpr)` and you are right I will have to traverse from it to see whether the `SizeExpr` is ambiguous or not, where it comes from. I want to rely on the `trackExpressionValue()` as the `SizeExpr` is available by `getDynamicSizeExpr()`, so it is one or two lines of code. Would you create your own switch-case to see where is the size expression goes in the allocation and use `trackExpressionValue()` on it? So that you do not store information in the global state which results in better run-time / less memory. At first I really wanted to model `malloc()` and `realloc()` and stuff, then I realized the `MallocChecker` provides every information I need. Would it be a better idea to create my own tiny `MallocChecker` inside my checker which does nothing but marks the size expression being interesting with `NoteTags`? Also I am thinking of a switch-case on the `DefinedOrUnknownSVal Size` which somewhere has an expression inside it which I could `trackExpressionValue()` on. Basically we are missing the rules what to use and I have picked the easiest solution. Could you share please which would be the right direction for such a simple task?

NoQ added inline comments.Nov 4 2019, 2:34 PM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
125	I want to rely on the `trackExpressionValue()` as the `SizeExpr` is available by `getDynamicSizeExpr()`, so it is one or two lines of code. This won't work. `trackExpressionValue()` can only track an active expression (that has, or at least should have, a value in the bug-node's environment). You'll have to make a visitor or a note tag. You can either make your own visitor (which will detect the node in which the extent information becomes present), or convert `MallocChecker` to use note tags and then inter-operate with those tags (though the interestingness map - "i mark the symbol as interesting so i'm interested in highlighting the allocation site" - or a similar mechanism). The second approach is more work because no such interoperation has ever been implemented yet, but it should be pretty rewarding for the future.

NoQ added inline comments.Nov 4 2019, 2:41 PM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
185	summons @Szelethus Apart from the obviously syntactic cases, you might actually be able to implement fixits for the situation when the reaching-definitions analysis displays exactly one definition for `x`, which additionally coincides with the allocation site. If that definition is a simple assignment, you'll be able to re-run the reaching definitions analysis for the RHS of that assignment. If that definition comes from a function call, you might be able to re-run the reaching definitions analysis on the return statement(s) of that function (note that this function must have been inlined during path-sensitive analysis, otherwise no definition in it would coincide with the allocation site). And so on. This problem sheds some light on how much do we want to make the reaching definitions analysis inter-procedural. My current guess is that we probably don't need to; we'd rather have this guided by re-running the reaching-definitions analysis based on the path-sensitive report data, than have the reaching-definitions analysis be inter-procedural on our own.

This comment has been deleted.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
125	This won't work. trackExpressionValue() can only track an active expression (that has, or at least should have, a value in the bug-node's environment). You'll have to make a visitor or a note tag. So because most likely after the `malloc()` the `size` symbol dies, the `trackExpressionValue()` cannot track dead symbols? Because we could make the `size` dying base on the `buffer`, we have some dependency logic for that. It also represents the truth, the size is part of that memory block's region. After that we could track the expression of the `size`?

NoQ added inline comments.Nov 4 2019, 3:15 PM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
125	So because most likely after the malloc() the size symbol dies...? After the `malloc()` is consumed, the size expression dies and gets cleaned up from the Environment. The symbol will only die if the value wasn't put into the Store in the process of modeling the statement that consumes the `malloc()` expression (such as an assignment). But `trackExpressionValue()` can only track live (active) expressions.

Use existing visitors.
Make the MallocBugVisitor available for every checker.
Fix duplication of fix-its on the warning path piece when we emit a note as well.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
125	I see. Now I have tried out what we have. The `trackExpressionValue()` has a lookup to see where is the expression available: /// Find the ExplodedNode where the lvalue (the value of 'Ex') /// was computed. static const ExplodedNode* findNodeForExpression(const ExplodedNode N, const Expr Inner) { while (N) { if (N->getStmtForDiagnostics() == Inner) return N; N = N->getFirstPred(); } return N; } from that point the expression was alive, and tracking is fine. The `InnerPointerChecker` has introduced a place: `AllocationState.h` to communicate with the `MallocBugVisitor`. I believe this is the simplest way to communicate.
185	That is a cool idea! I hope @Szelethus has time for his project.

Do not try to fix-it an array with offsets.

Support alloca().

Hmm, so this checker is rather a collection of CERT rule checkers, right? Shouldn't the checker name contain the actual rule name (STR31-C)? User interfacewise, I would much prefer smaller, leaner checkers than a big one with a lot of options, which are barely supported for end-users. I would expect a cert package to contain subpackages like cert.str, and checker names cert.str.31StringSize, or similar. Also, shouldn't we move related checkers from security.insecureAPI to cert? Or just mention the rule name in the description, and continue not having a cert package?

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
70	Hmm. We have a variety of checkers that check for a CERT rule. Maybe we should put the rest here as well, if would better follow the clang-tidy interface. I'll ask around in the office.
clang/lib/StaticAnalyzer/Checkers/AllocationState.h
29	I would prefer if this header file didn't exist, or was thought out better, because its messy that we hide `MallocChecker`, but expose its guts like this. The change is fine, this is just a critique of the checker.
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
185	This sounds very cool! Once we're at the bug report construction phase, we can make reaching definitions analysis "interprocedural enough" for most cases, I believe.

In D69813#1734193, @Szelethus wrote:

Hmm, so this checker is rather a collection of CERT rule checkers, right? Shouldn't the checker name contain the actual rule name (STR31-C)? User interfacewise, I would much prefer smaller, leaner checkers than a big one with a lot of options, which are barely supported for end-users. I would expect a cert package to contain subpackages like cert.str, and checker names cert.str.31StringSize, or similar.

It is the STR rules of CERT, nothing else. Most of the rules are tied together, and that is why the checker needs to be designed as one checker at first. I am not sure which part of the STR I will cover, so may when the checker evolves and some functions does not need the same helper methods we need to create new checkers. STR31 and STR32 are my projects which is like one single project. Also I did not except the users to specify the rule number, but this checker could be something like cert.str.Termination. There is two floating-point CERT checkers inside the insecureAPI that is why I have introduced the cert package, which will have three members, one is that new checker. I think a new package is only necessary if it contains at least two checkers.

Also, shouldn't we move related checkers from security.insecureAPI to cert? Or just mention the rule name in the description, and continue not having a cert package?

We should not, they does not fit into CERT rules, but it has two CERT floating-point checkers. The cert package should be well described with CERT rules. I want to move the CERT checkers from it into that cert package, and leave the rest.

In D69813#1734272, @Charusso wrote:

In D69813#1734193, @Szelethus wrote:

Hmm, so this checker is rather a collection of CERT rule checkers, right? Shouldn't the checker name contain the actual rule name (STR31-C)? User interfacewise, I would much prefer smaller, leaner checkers than a big one with a lot of options, which are barely supported for end-users. I would expect a cert package to contain subpackages like cert.str, and checker names cert.str.31StringSize, or similar.

It is the STR rules of CERT, nothing else. Most of the rules are tied together, and that is why the checker needs to be designed as one checker at first. I am not sure which part of the STR I will cover, so may when the checker evolves and some functions does not need the same helper methods we need to create new checkers. STR31 and STR32 are my projects which is like one single project. Also I did not except the users to specify the rule number, but this checker could be something like cert.str.Termination. There is two floating-point CERT checkers inside the insecureAPI that is why I have introduced the cert package, which will have three members, one is that new checker. I think a new package is only necessary if it contains at least two checkers.

Implementationwise that sounds wonderful, these rules are fairly similar to have a single checker responsible for them. The user interface however (enabling different CERT rules) they should probably be split up into separate checkers per rule, rather than options, wouldn't you agree? @NoQ? Also, cert.str.Termination sounds like a wonderful name, I don't insist on the rule number being present in it, but at the very least, it should be in the description.

Also, shouldn't we move related checkers from security.insecureAPI to cert? Or just mention the rule name in the description, and continue not having a cert package?

We should not, they does not fit into CERT rules, but it has two CERT floating-point checkers. The cert package should be well described with CERT rules. I want to move the CERT checkers from it into that cert package, and leave the rest.

I meant related ones only, though I didn't go through the checkers individually, it might just be the case that insecureApi doesn't implement any specific CERT rules :)

In D69813#1735804, @Szelethus wrote:

In D69813#1734272, @Charusso wrote:

In D69813#1734193, @Szelethus wrote:

Hmm, so this checker is rather a collection of CERT rule checkers, right? Shouldn't the checker name contain the actual rule name (STR31-C)? User interfacewise, I would much

Implementationwise that sounds wonderful, these rules are fairly similar to have a single checker responsible for them. The user interface however (enabling different CERT rules) they should probably be split up into separate checkers per rule, rather than options, wouldn't you agree? @NoQ?

I'm not @NoQ, but I do agree that there should be a separate check per rule in terms of the UI presented to the user. The name should follow the rule ID like they do in clang-tidy, for some consistency there.

Also, cert.str.Termination sounds like a wonderful name, I don't insist on the rule number being present in it, but at the very least, it should be in the description.

I think that the rule number should be in the name. I'd probably go with cert.STR31-C or cert.str31-c (so it's clear which CERT standard the rule came from).

In D69813#1735988, @aaron.ballman wrote:

I'm not @NoQ, but I do agree that there should be a separate check per rule in terms of the UI presented to the user. The name should follow the rule ID like they do in clang-tidy, for some consistency there.
I think that the rule number should be in the name. I'd probably go with cert.STR31-C or cert.str31-c (so it's clear which CERT standard the rule came from).

We warmly welcome not (@NoQ)s! I think Artem really wanted to start this direction to make the two tool work together, but I have seen his project is unbelievably difficult so that it is a little-bit far away, sadly. Even we are far away to have multiple CERT rules in this package, if the Tidy users like the code-names, I cannot say no to start the collaboration with Tidy. I would pick cert.str.31-c, as @Szelethus pointed out we use lower-case words for package names and then we can run every cert.str checker at once.

Thanks for the ideas @Szelethus, @aaron.ballman!

In D69813#1736045, @Charusso wrote:

In D69813#1735988, @aaron.ballman wrote:

I'm not @NoQ, but I do agree that there should be a separate check per rule in terms of the UI presented to the user. The name should follow the rule ID like they do in clang-tidy, for some consistency there.
I think that the rule number should be in the name. I'd probably go with cert.STR31-C or cert.str31-c (so it's clear which CERT standard the rule came from).

We warmly welcome not (@NoQ)s! I think Artem really wanted to start this direction to make the two tool work together, but I have seen his project is unbelievably difficult so that it is a little-bit far away, sadly. Even we are far away to have multiple CERT rules in this package, if the Tidy users like the code-names, I cannot say no to start the collaboration with Tidy. I would pick cert.str.31-c, as @Szelethus pointed out we use lower-case words for package names and then we can run every cert.str checker at once.

Would it make sense to use cert.str.31.c to remove the random dash? Would this also help the user to do something like cert.str.*.cpp? if they want just the CERT C++ STR rules checked? Or can they do that already even with the -?

In D69813#1736611, @aaron.ballman wrote:

Would it make sense to use cert.str.31.c to remove the random dash? Would this also help the user to do something like cert.str.*.cpp? if they want just the CERT C++ STR rules checked? Or can they do that already even with the -?

Well, we could introduce package cert.str.c and cert.str.cpp and then the rule-number follows: cert.str.c.31 where the 31 is the name of the checker in this case, which sounds very strange. @Szelethus is the code owner of our frontend so I would wait how he imagine the layout. As I know to enable every C checker of the package cert.str we need to create a c package because we do not have such a logic to put * in the package name before the checker's name and the package c clarify what the user wants to do. Now I have checked your cert.str.cpp page [1] and I think the cert.str.cpp invocation could invoke the cert.str.c as a dependency, because the c rules apply to cpp as you have written.

On the other hand we are trying to avoid larger scope changes than the necessary because we do not know when cert.str.c would contain at least two checkers. That is why I was so minimal and only introduced the package cert because we already have two FLP checkers inside the insecureAPI base-checker.

[1] https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=88046330

In D69813#1736667, @Charusso wrote:

In D69813#1736611, @aaron.ballman wrote:

Would it make sense to use cert.str.31.c to remove the random dash? Would this also help the user to do something like cert.str.*.cpp? if they want just the CERT C++ STR rules checked? Or can they do that already even with the -?

Well, we could introduce package cert.str.c and cert.str.cpp and then the rule-number follows: cert.str.c.31 where the 31 is the name of the checker in this case, which sounds very strange. @Szelethus is the code owner of our frontend so I would wait how he imagine the layout.

I wouldn't want to go with that approach because it confuses the names from the coding standard it's meant to check. I think a good policy is to try to keep the check names similar to the coding standard names whenever possible (regardless of the coding standard).

As I know to enable every C checker of the package cert.str we need to create a c package because we do not have such a logic to put * in the package name before the checker's name and the package c clarify what the user wants to do. Now I have checked your cert.str.cpp page [1] and I think the cert.str.cpp invocation could invoke the cert.str.c as a dependency, because the c rules apply to cpp as you have written.

Yes, the C++ rules incorporate some of the C rules, but not all of them, which kind of complicates things. The STR section happens to take everything from the C STR section.

On the other hand we are trying to avoid larger scope changes than the necessary because we do not know when cert.str.c would contain at least two checkers. That is why I was so minimal and only introduced the package cert because we already have two FLP checkers inside the insecureAPI base-checker.

Understandable.

[1] https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=88046330

The packaging have not been addressed yet.
Inject the "zombie" size expression to the new function call (fgets) if none of the size expression's regions have been modified.

The idea is that: When we set up a variable size = 13; it modifies the region, but the size expression is not stored yet, so we do not invalidate anything. We store the malloc(size + 1)'s size, after that the dead-symbol-purging kick in and it either invalidate the region or makes it keep alive.

If the region of size is alive after the purge point we cannot inject the "zombie" size + 1 as an expression, we need to obtain its concrete value: 14. (When the redefinition happen I wanted to create a NoteTag, but I have not seen a simple way to do so.)

If the region of size has been purged out, it is safe to copy-and-paste the "zombie" size + 1 as an expression.

NoQ added inline comments.Nov 13 2019, 11:13 AM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
201–207	All right, so basically what you're saying is that literally every invocation of `gets()` deserves a warning. This means that for all practical purposes your checker is an AST-based checker, just implemented with path-sensitive callbacks. A path-sensitive checker emits warnings based on multiple events that happen sequentially along the path (use-after-free: "memory deallocated - that same memory used", division by zero: "value constrained to zero - something is being divided by that same value", etc.) but your checker emits the warning by looking at only one statement: "`gets()` is invoked". Do i understand correctly that your plan is to use path-sensitive analysis for fixits only? But you can't emit fixits for any truly path-sensitive warning anyway. Fixits must work correctly on all execution paths, so you cannot emit a correct fixit by looking at only one execution path. In order to emit fixits, you need to either resort to a pure AST check anyway ("this expression refers to an array of fixed size"), or maybe implement auxiliary data-flow analysis for a certain must-problem (eg., "the buffer argument may have exactly one possible value across all paths that reach `gets()`"). But in both cases the path-sensitive engine does literally nothing to help you; all the data that you'll need for your fixit will be available from the AST.

NoQ added inline comments.Nov 13 2019, 11:27 AM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
201–207	Like, i think this was an interesting investigation and i was genuinely curious about how this turns out to be, but for now it seems that the problem you're trying to solve cannot be solved this way. Path sensitive analysis is fundamentally applicable to only 50% of the problems (to "may-problems" but not to "must-problems"), and the problem you're trying to solve is in the latter category. I believe you'll have to fall back to the relatively boring task of adding fixits to `security.insecureAPI.gets`; but then, again, if you manage to employ use-def chains for this problem, that might be quite an inspiring start.

Charusso added inline comments.Nov 13 2019, 11:43 AM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
201–207	Most of the time the given allocation to hold the arbitrary string happens in a local scope. After that we see `fscanf(dst)`, `gets(dst)`, `memcpy(dst)`, `strncpy(dst)`, stuff... which pushes new data into that memory block, and then the cool developers write that down: `dst[42] = '\0'` which means all the reports should be thrown away in a path-sensitive manner on `dst`. Reallocations, re-bindings, non-AST stuff could handled very easily with the non-AST checker, like that one. Sometimes we are work with destination-array like `memcpy(Foo[Bar.Baz]->Qux, ...)` which could not really handled with just a simple AST-based checker. I could not say at the moment we could handle it with symbols, but we have a much larger scope of information by symbols. Most of the time because of the Analyzer is much smarter than the Tidy we could emit fix-its with the help of flow-sensitiveness very easily. I would create huge white-lists what we want to fix-it, and what we could not, but at some point if we model the symbols better, we can. Other than that easy false-positive suppression and tiny flow-sensitive rebinding stuff, we could be sure what is going on by each string-manipulation. The `gets()` is a toy example where at most a `grep -rn 'gets('` could do better analysis than us. The real world looks like that: 1 encryptedpasswordlen = ((strlen(passwd) + RADIUS_VECTOR_LENGTH - 1) / RADIUS_VECTOR_LENGTH) * RADIUS_VECTOR_LENGTH; 2 cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH); 3 memcpy(cryptvector, secret, strlen(secret)); ... 4 for (i = 0; i < encryptedpasswordlen; i += RADIUS_VECTOR_LENGTH) { 5 memcpy(cryptvector + strlen(secret), md5trailer, RADIUS_VECTOR_LENGTH); ... from `postgresql/src/backend/libpq/auth.c` At `3` we would emit a warning, because the null-termination left by the wrong size of the string, but at `5` we see that, it left, because at that offset the string continues, and dunno, on `6` when we model every flow-sensitive information, the string left non-terminated. Of course each of that stuff is local and AST-based (with huge overhead of rebindings and impossible false-positive suppression), but when you have two of it, that is when the fun begins.

Charusso added inline comments.Nov 13 2019, 12:21 PM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
201–207	if you manage to employ use-def chains for this problem, that might be quite an inspiring start. We have regions so we do not need to rely on such chains in the AST-world, if I get your idea right by the "Use-define chain" wiki [1]. Btw. it is not that difficult problem in the AST-world, you need to create a recursive AST-matcher on the `DeclRefExpr` with `std::function`. Basically, I want to implement all the STR rules in a logical order, here is one of the examples from STR32-C [2] which is my last planned project at the moment: void lessen_memory_usage(void) { wchar_t temp; size_t temp_size; / ... / if (cur_msg != NULL) { temp_size = cur_msg_size / 2 + 1; temp = realloc(cur_msg, temp_size sizeof(wchar_t)); /* temp &and cur_msg may no longer be null-terminated / if (temp == NULL) { / Handle error */ } cur_msg = temp; cur_msg_size = temp_size; cur_msg_len = wcslen(cur_msg); } } They really want to represent the wild, and please think of that problem in terms of an AST-checker versus in terms of `getAsRegion` and `getDynamicElementCount` to compare the size of the allocated memory block and inject that: `cur_msg[temp_size - 1] = L'\0';` because the array would overflow. How cool is the Analyzer and how smart to do so. It would took at most 10 minutes to implement if the `evalBinOp` would work or the main 10 years old implementation of obtaining the element-count would work. I am on the way to fixing the latter, but it will be more path-sensitive info, than you could imagine, like reusing the "zombie" size-expression turned out to be a hard problem. And it will be a lot easier to solve such problems, I believe. The local scope is the key, and that checker at the moment only tries to rewrite destination-arrays which are local. I think we could see if we emit multiple reports on a given call so due to ambiguity we would drop such fix-its. With an AST checker I could not imagine how difficult it would be. It is rather a research at the moment, because I have encountered dozens of silly stuff, beginning with the `getExtent()`, so I cannot say this direction is the 100% future, but I have picked the Analyzer over Tidy, for that reason, it is smarter. [1] https://en.wikipedia.org/wiki/Use-define_chain [2] https://wiki.sei.cmu.edu/confluence/display/c/STR32-C.+Do+not+pass+a+non-null-terminated+character+sequence+to+a+library+function+that+expects+a+string

NoQ added inline comments.Nov 13 2019, 3:27 PM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
201–207	It would took at most 10 minutes to implement if the `evalBinOp` would work or the main 10 years old implementation of obtaining the element-count would work. You have multiple different checkers that you want to implement and for each of them there are two parts of the problem: (1) Emit the warning, (2) Emit a fixit for that warning. Path-sensitive analysis is perfect for (1), at least for some checkers (not for the one in this patch). But if you try to rely on path-sensitive analysis for (2), your result will simply be incorrect for the reason stated above: there may be other execution paths that you haven't taken into account. And when you say I would create huge white-lists what we want to fix-it , this literally means repeating a lot of the work you did in D45050, as you have to write down matchers (or a CFG-based data flow analysis) for the cases that you really can fix. At this point you will not only be aware of the allocation site from which you can extract the size-expression, but also of all other possible allocation sites. So i honestly believe that you should drop the idea of using path sensitive analysis to help you with fixits, and instead focus on the two more-or-less-independent tasks of (a) developing path-sensitive checkers for the CERT problems you're interested in and (b) developing fixits for such checkers in a syntactic manner without relying on the information obtained via the path-sensitive analysis.

Given that we are having two different projects at first let us create the path-sensitive error-catching + false positive suppression + design of the CERT rules, and when we are fine, we get back to the impossible-to-solve problem: to adjust fix-its (path-sensitively).

I have not seen I am creating two different projects, because in my mind the Analyzer would be useless if it would be an expensive grep, so umm, sorry.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
201–207	It would took at most 10 minutes to implement if the evalBinOp would work or the main 10 years old implementation of obtaining the element-count would work. You have multiple different checkers that you want to implement and for each of them there are two parts of the problem: (1) Emit the warning, (2) Emit a fixit for that warning. This string-handling misuse of the language C consists of thousands of entries across random open source projects, that is why I would like to suggest fix-its. With the help of path-sensitive stuff I would like prove the fix-its are being fine. Path-sensitive analysis is perfect for (1), at least for some checkers (not for the one in this patch). But if you try to rely on path-sensitive analysis for (2), your result will simply be incorrect for the reason stated above: there may be other execution paths that you haven't taken into account. I know that the path-sensitive analysis means that: "There is a path". But I also know that there is one single function body where the `(allocation, string manipulation, return data)` triplet takes place. Here we should take every of the execution paths because we are in a local scope. We could be 100% sure when the obtained size-expression's region is not safe to reuse. I do not think about function-boundaries because the peoples write code like that triplet. Sometimes the size-expression is coming from a function-call, but that should be fine to obtain. And when you say I would create huge white-lists what we want to fix-it , this literally means repeating a lot of the work you did in D45050, as you have to write down matchers (or a CFG-based data flow analysis) for the cases that you really can fix. At this point you will not only be aware of the allocation site from which you can extract the size-expression, but also of all other possible allocation sites. There is one allocation site 99,99% of the time, where the 0,01% is your counter example in this review. Of course there must be such case in the wild, somewhere, deep in the woods, and of course the most error-prone case is the 0,01%, but the other 99,99% has the same issue, and we could provide fix-its easily if we do not focus mostly on the 0,01%. I want to make my stuff non-alpha, even none of the non-alpha checkers or not the Tidy can provide 100% accuracy, that stuff with fix-its should be 100% accurate. So in case of the 0,01% we detect it and we do not fix-it, that is it. And we are hoping someone creates summary-based analysis, so that we can rewrite the 0,01% as well. So i honestly believe that you should drop the idea of using path sensitive analysis to help you with fixits, and instead focus on the two more-or-less-independent tasks of (a) developing path-sensitive checkers for the CERT problems you're interested in and (b) developing fixits for such checkers in a syntactic manner without relying on the information obtained via the path-sensitive analysis. Plot twist: nor the AST-checkers and nor the path-sensitive-checkers could solve this issue. You cannot state out that my approach is bad, so I should change it. I cannot state out my approach is good, so we should go for it. I believe in that when I dropped my Tidy-career then I have picked the right tool which I can improve to create 100% accuracy and model the impossible-to-model stuffs, like that STR rules. Of course, in a path-sensitive manner, to drop every heuristic, to model everything which necessary with offsets and allocations, and for false positive suppression. The key here, to modify the Analyzer from that statement "There is a path" to "There is only a path", according to the use-case of the string manipulation being a single path.

NoQ added a subscriber: gribozavr.Nov 13 2019, 6:33 PM

NoQ added inline comments.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
201–207	There is one allocation site 99,99% of the time, where the 0,01% is your counter example in this review. It might be 0,01% for this checker (and if so, a simple AST-based solution will work equally well), but for other checkers there will be a lot more problems: like, how do you null-terminate a string if the size of the string depends on the execution path? The key here, to modify the Analyzer from that statement "There is a path" to "There is only a path", according to the use-case of the string manipulation being a single path. This is a possible approach to one of our open projects, "Implement a dataflow flamework", which would allow us to solve must-problems as well as may-problems. Like, it sounds easy: if we simply were able to figure out whether the analyzer has explored all paths or not in the current analysis, we will be able to solve may-problems in all cases in which the analyzer has actually explored all paths, by post-processing the `ExplodedGraph`. I've been advocating for this approach for some time in the past, but this approach is clearly a dead end because in most of the interesting cases (eg., in presence of any sort of loops) it'll reply "i don't know" ("we were clearly unable to explore all paths"). The more principled solution is to make an actual data flow framework in the usual sense (i.e., an API that would allow us to write "must-problem" checks in the same way as we currently write path-sensitive checks but with the extra join operation). I heard that @gribozavr has some plans in this area.

Charusso marked an inline comment as done.Nov 13 2019, 7:46 PM

Charusso added inline comments.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
201–207	how do you null-terminate a string if the size of the string depends on the execution path? Here is your example a little-bit modified: char *x = 0; char y[10]; if (coin()) { x = malloc(20); } else { x = y; } char src[] = "Foo"; strcpy(x, src); Because the source string is only available after the allocation we cannot make sure the allocation could hold the entire source object which appropriate allocation would be: `malloc(strlen(src) + 1)` or something similar. Also the `free()` is another hard problem immediately. It is that 0,01% case which you do not encounter in the wild. The usual approach: having a memory block - fill it - return it. But the Static Analyzer made for that purpose, so it can detect such path-sensitive information and prevent to emit a fix-it. "Implement a dataflow flamework" That is a cool idea, but for the first time I want to think about local scopes. Within a single function body, that is the scope I can safely measure, I believe. If it turns out to be a good idea, we could improve on it step-by-step. I really wanted to measure my stuff, but every time when I have finished I had to rewrite my stuff near from scratch, so I cannot say this checker is somewhat special to rewrite half of the Analyzer. May it will be. I have learnt nothing from Tidy, but that: prove that the fix-it is valid, if it is, emit, otherwise throw away. The Tidy cannot measure such information, and that is where the flow-sensitive stuff kicks in. If our engine would be perfect, with dataflow-ctu-summary-dunnowhat analysis collaborating together may we would consider to rewrite such crazy difficult to rewrite code. I think we are that far away of that level, we have to avoid to think of that kind of fix-its. But the 99,99% totally worth and easy to measure with the help of dynamic size information, like I have implemented mostly of the STR31-C and STR32-C examples based on that idea with fix-its. I cannot emphasize enough well how simple is the way the string is manipulated and how easy it should be modeled with memory block regions or the dynamic "used size" of the memory block, and stuff like that. It is just a sequence of data-flow, from one given allocation to one given return, with the craziest offsets on the destination-array and with the craziest custom allocators. If you would like to cover more, we are not interested in multiple-buffers, but rather one single buffer with a custom allocator where the size expression should be easy to obtain. I have taken that heuristic and it worked out quite well. Like the Git's allocator injects the null terminator, they say, in a comment, safely - when they pass a wrong size to it: `strlen(src)`. So it would find possible false positives, but the commenting is not really the way we null-terminate. This is one of the most needed checkers, so I understand why you want to make it as precise as possible with the largest set of possibilities, but incremental development, eh? Peoples are lame and cannot count so they write `memcpy(dest, "Foobar", 6)`, where the next statements are telling us if the programmer was lame, or not (and on the previous line the `memcpy()` has an appropriate size expression). Please think about that kind of vulnerability when there is a Windows update every week. That was my base case when I started this project 2 years ago.

This patch moved to D70411.

Charusso removed a parent revision: D69746: [analyzer] FixItHint: Apply and test hints with the Clang-Tidy's script.Dec 6 2019, 4:24 PM

Revision Contents

Path

Size

clang/

include/

clang/

Lex/

Preprocessor.h

3 lines

StaticAnalyzer/

Checkers/

Checkers.td

19 lines

Core/

BugReporter/

CommonBugCategories.h

25 lines

PathSensitive/

DynamicSize.h

16 lines

DynamicSizeInfo.h

20 lines

lib/

StaticAnalyzer/

Checkers/

AllocationState.h

3 lines

CMakeLists.txt

1 line

MallocChecker.cpp

4 lines

cert/

StrChecker.cpp

297 lines

Core/

CommonBugCategories.cpp

24 lines

DynamicSize.cpp

41 lines

Frontend/

AnalysisConsumer.cpp

13 lines

test/

Analysis/

Inputs/

system-header-simulator.h

7 lines

analyzer-config.c

3 lines

cert/

89 lines

48 lines

33 lines

34 lines

Diff 228675

clang/include/clang/Lex/Preprocessor.h

Show First 20 Lines • Show All 1,026 Lines • ▼ Show 20 Lines	if (I == Submodules.end())
return false;		return false;
auto J = I->second.Macros.find(II);		auto J = I->second.Macros.find(II);
if (J == I->second.Macros.end())		if (J == I->second.Macros.end())
return false;		return false;
auto *MD = J->second.getLatest();		auto *MD = J->second.getLatest();
return MD && MD->isDefined();		return MD && MD->isDefined();
}		}

		MacroDefinition getMacroDefinition(StringRef Id) {
		return getMacroDefinition(&Identifiers.get(Id));
		}
MacroDefinition getMacroDefinition(const IdentifierInfo *II) {		MacroDefinition getMacroDefinition(const IdentifierInfo *II) {
if (!II->hasMacroDefinition())		if (!II->hasMacroDefinition())
return {};		return {};

MacroState &S = CurSubmoduleState->Macros[II];		MacroState &S = CurSubmoduleState->Macros[II];
auto *MD = S.getLatest();		auto *MD = S.getLatest();
while (MD && isa<VisibilityMacroDirective>(MD))		while (MD && isa<VisibilityMacroDirective>(MD))
MD = MD->getPrevious();		MD = MD->getPrevious();
▲ Show 20 Lines • Show All 1,311 Lines • Show Last 20 Lines

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td

	Show First 20 Lines • Show All 61 Lines • ▼ Show 20 Lines
	def Valist : Package<"valist">;			def Valist : Package<"valist">;

	def DeadCode : Package<"deadcode">;			def DeadCode : Package<"deadcode">;
	def DeadCodeAlpha : Package<"deadcode">, ParentPackage<Alpha>;			def DeadCodeAlpha : Package<"deadcode">, ParentPackage<Alpha>;

	def Performance : Package<"performance">, ParentPackage<OptIn>;			def Performance : Package<"performance">, ParentPackage<OptIn>;

	def Security : Package <"security">;			def Security : Package <"security">;
				def CERT : Package<"cert">, ParentPackage<Security>;
				SzelethusUnsubmitted Done Reply Inline Actions Hmm. We have a variety of checkers that check for a CERT rule. Maybe we should put the rest here as well, if would better follow the clang-tidy interface. I'll ask around in the office. Szelethus: Hmm. We have a variety of checkers that check for a CERT rule. Maybe we should put the rest…
	def InsecureAPI : Package<"insecureAPI">, ParentPackage<Security>;			def InsecureAPI : Package<"insecureAPI">, ParentPackage<Security>;
	def SecurityAlpha : Package<"security">, ParentPackage<Alpha>;			def SecurityAlpha : Package<"security">, ParentPackage<Alpha>;
	def Taint : Package<"taint">, ParentPackage<SecurityAlpha>;			def Taint : Package<"taint">, ParentPackage<SecurityAlpha>;

	def Unix : Package<"unix">;			def Unix : Package<"unix">;
	def UnixAlpha : Package<"unix">, ParentPackage<Alpha>;			def UnixAlpha : Package<"unix">, ParentPackage<Alpha>;
	def CString : Package<"cstring">, ParentPackage<Unix>;			def CString : Package<"cstring">, ParentPackage<Unix>;
	def CStringAlpha : Package<"cstring">, ParentPackage<UnixAlpha>;			def CStringAlpha : Package<"cstring">, ParentPackage<UnixAlpha>;
	▲ Show 20 Lines • Show All 702 Lines • ▼ Show 20 Lines
	def FloatLoopCounter : Checker<"FloatLoopCounter">,			def FloatLoopCounter : Checker<"FloatLoopCounter">,
	HelpText<"Warn on using a floating point value as a loop counter (CERT: "			HelpText<"Warn on using a floating point value as a loop counter (CERT: "
	"FLP30-C, FLP30-CPP)">,			"FLP30-C, FLP30-CPP)">,
	Dependencies<[SecuritySyntaxChecker]>,			Dependencies<[SecuritySyntaxChecker]>,
	Documentation<HasDocumentation>;			Documentation<HasDocumentation>;

	} // end "security"			} // end "security"

				let ParentPackage = CERT in {

				def CERTStrChecker : Checker<"Str">,
				HelpText<"CERT checker of string related rules.">,
				CheckerOptions<[
				CmdLineOption<Boolean,
				"WantToUseSafeFunctions",
				"An integer non-zero value specifying if the target "
				"environment is considered to implement '_s' suffixed memory "
				"and string handler functions which are safer than older "
				"versions (e.g. 'memcpy_s()')",
				"true",
				Released>,
				]>,
				Documentation<NotDocumented>;

				} // end "CERT"

	let ParentPackage = SecurityAlpha in {			let ParentPackage = SecurityAlpha in {

	def ArrayBoundChecker : Checker<"ArrayBound">,			def ArrayBoundChecker : Checker<"ArrayBound">,
	HelpText<"Warn about buffer overflows (older checker)">,			HelpText<"Warn about buffer overflows (older checker)">,
	Documentation<HasAlphaDocumentation>;			Documentation<HasAlphaDocumentation>;

	def ArrayBoundCheckerV2 : Checker<"ArrayBoundV2">,			def ArrayBoundCheckerV2 : Checker<"ArrayBoundV2">,
	HelpText<"Warn about buffer overflows (newer checker)">,			HelpText<"Warn about buffer overflows (newer checker)">,
	▲ Show 20 Lines • Show All 614 Lines • Show Last 20 Lines

clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h

	//=--- CommonBugCategories.h - Provides common issue categories -- C++ --===//			//=--- CommonBugCategories.h - Provides common issue categories -- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CLANG_STATICANALYZER_CORE_BUGREPORTER_COMMONBUGCATEGORIES_H			#ifndef LLVM_CLANG_STATICANALYZER_CORE_BUGREPORTER_COMMONBUGCATEGORIES_H
	#define LLVM_CLANG_STATICANALYZER_CORE_BUGREPORTER_COMMONBUGCATEGORIES_H			#define LLVM_CLANG_STATICANALYZER_CORE_BUGREPORTER_COMMONBUGCATEGORIES_H

	// Common strings used for the "category" of many static analyzer issues.			// Common strings used for the "category" of many static analyzer issues.
	namespace clang {			namespace clang {
	namespace ento {			namespace ento {
	namespace categories {			namespace categories {
	extern const char * const CoreFoundationObjectiveC;			extern const char *const CoreFoundationObjectiveC;
	extern const char * const LogicError;			extern const char *const LogicError;
	extern const char * const MemoryRefCount;			extern const char *const MemoryRefCount;
	extern const char * const MemoryError;			extern const char *const MemoryError;
	extern const char * const UnixAPI;			extern const char *const UnixAPI;
	extern const char * const CXXObjectLifecycle;			extern const char *const CXXObjectLifecycle;
	}			extern const char *const SecurityError;
	}			} // namespace categories
	}			} // namespace ento
	#endif			} // namespace clang

				#endif // LLVM_CLANG_STATICANALYZER_CORE_BUGREPORTER_COMMONBUGCATEGORIES_H

clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h

	//===- DynamicSize.h - Dynamic size related APIs ----------------- C++ --===//			//===- DynamicSize.h - Dynamic size related APIs ----------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// This file defines APIs that track and query dynamic size information.			// This file defines APIs that track and query dynamic size information.
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZE_H			#ifndef LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZE_H
	#define LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZE_H			#define LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZE_H

				#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h"
	#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"			#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
	#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"			#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
	#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"			#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramStateTrait.h"
	#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState_Fwd.h"			#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState_Fwd.h"
	#include "clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h"			#include "clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h"

	namespace clang {			namespace clang {
	namespace ento {			namespace ento {

				/// \returns The dynamic size info for the region \p MR.
				const DynamicSizeInfo *getDynamicSizeInfo(ProgramStateRef State,
				const MemRegion *MR);

	/// \returns The stored dynamic size for the region \p MR.			/// \returns The stored dynamic size for the region \p MR.
	DefinedOrUnknownSVal getDynamicSize(ProgramStateRef State, const MemRegion *MR,			DefinedOrUnknownSVal getDynamicSize(ProgramStateRef State, const MemRegion *MR,
	SValBuilder &SVB);			SValBuilder &SVB);

	/// \returns The stored dynamic size expression for the region \p MR.			/// \returns The stored dynamic size expression for the region \p MR.
	const Expr getDynamicSizeExpr(ProgramStateRef State, const MemRegion MR);			const Expr getDynamicSizeExpr(ProgramStateRef State, const MemRegion MR);

	/// \returns The stored element count of the region \p MR.			/// \returns The stored element count of the region \p MR.
	DefinedOrUnknownSVal getDynamicElementCount(ProgramStateRef State,			DefinedOrUnknownSVal getDynamicElementCount(ProgramStateRef State,
	const MemRegion *MR,			const MemRegion *MR,
	SValBuilder &SVB,			SValBuilder &SVB,
	QualType ElementTy);			QualType ElementTy);

	/// Set the dynamic size \p Size with its expression \p SizeExpr of the region			/// Set the dynamic size \p Size with its expression \p SizeExpr of the region
	/// \p MR.			/// \p MR.
	ProgramStateRef setDynamicSize(ProgramStateRef State, const MemRegion *MR,			ProgramStateRef setDynamicSize(ProgramStateRef State, const MemRegion *MR,
	DefinedOrUnknownSVal Size, const Expr *SizeExpr);			DefinedOrUnknownSVal Size, const Expr *SizeExpr);

				/// If a part of the size expression is modified later on the path the size
				/// expression cannot be used to represent the allocated memory block's size.
				/// This function tries to split up the size expression and see whether the
				/// \p ExplicitRegions contains one of the regions of the size expression which
				/// means the expression will be modified later on the path so we need to mark
				/// the size expression invalid.
				ProgramStateRef
				invalidateDynamicSizeExpr(ProgramStateRef State,
				ArrayRef<const MemRegion *> ExplicitRegions,
				const LocationContext *LCtx);

	} // namespace ento			} // namespace ento
	} // namespace clang			} // namespace clang

	#endif // LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZE_H			#endif // LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZE_H

clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h

	//===- DynamicSizeInfo.h - Runtime size information -------------- C++ --===//			//===- DynamicSizeInfo.h - Runtime size information -------------- C++ --===//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#ifndef LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZEINFO_H			#ifndef LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZEINFO_H
	#define LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZEINFO_H			#define LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZEINFO_H

	namespace clang {
	namespace ento {

	#include "clang/AST/Expr.h"			#include "clang/AST/Expr.h"
	#include "clang/Basic/LLVM.h"			#include "clang/Basic/LLVM.h"
	#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"			#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"

				namespace clang {
				namespace ento {

	/// Helper class to store information about the dynamic size.			/// Helper class to store information about the dynamic size.
	class DynamicSizeInfo {			class DynamicSizeInfo {
	public:			public:
	DynamicSizeInfo(DefinedOrUnknownSVal Size, const Expr *SizeExpr = nullptr)			DynamicSizeInfo(DefinedOrUnknownSVal Size, const Expr *SizeExpr = nullptr,
	: Size(Size), SizeExpr(SizeExpr) {}			bool IsValid = true)
				: Size(Size), SizeExpr(SizeExpr), IsValid(IsValid) {}

	DefinedOrUnknownSVal getSize() const { return Size; }			DefinedOrUnknownSVal getSize() const { return Size; }

	const Expr *getSizeExpr() const { return SizeExpr; }			const Expr *getSizeExpr() const { return SizeExpr; }

				/// \returns The size expression being valid.
				bool isValid() const { return IsValid; }

	bool operator==(const DynamicSizeInfo &RHS) const {			bool operator==(const DynamicSizeInfo &RHS) const {
	return Size == RHS.Size && SizeExpr == RHS.SizeExpr;			return Size == RHS.Size && SizeExpr == RHS.SizeExpr &&
				IsValid == RHS.IsValid;
	}			}

	void Profile(llvm::FoldingSetNodeID &ID) const {			void Profile(llvm::FoldingSetNodeID &ID) const {
	ID.Add(Size);			ID.Add(Size);
	ID.AddPointer(SizeExpr);			ID.AddPointer(SizeExpr);
				ID.AddBoolean(IsValid);
	}			}

	private:			private:
	DefinedOrUnknownSVal Size;			DefinedOrUnknownSVal Size;
	const Expr *SizeExpr;			const Expr *SizeExpr;
				bool IsValid;
	};			};

	} // namespace ento			} // namespace ento
	} // namespace clang			} // namespace clang

	#endif // LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZEINFO_H			#endif // LLVM_CLANG_STATICANALYZER_CORE_PATHSENSITIVE_DYNAMICSIZEINFO_H

clang/lib/StaticAnalyzer/Checkers/AllocationState.h

	Show All 19 Lines
	ProgramStateRef markReleased(ProgramStateRef State, SymbolRef Sym,			ProgramStateRef markReleased(ProgramStateRef State, SymbolRef Sym,
	const Expr *Origin);			const Expr *Origin);

	/// This function provides an additional visitor that augments the bug report			/// This function provides an additional visitor that augments the bug report
	/// with information relevant to memory errors caused by the misuse of			/// with information relevant to memory errors caused by the misuse of
	/// AF_InnerBuffer symbols.			/// AF_InnerBuffer symbols.
	std::unique_ptr<BugReporterVisitor> getInnerPointerBRVisitor(SymbolRef Sym);			std::unique_ptr<BugReporterVisitor> getInnerPointerBRVisitor(SymbolRef Sym);

				/// \returns The MallocBugVisitor.
				std::unique_ptr<BugReporterVisitor> getMallocBRVisitor(SymbolRef Sym);
				SzelethusUnsubmitted Done Reply Inline Actions I would prefer if this header file didn't exist, or was thought out better, because its messy that we hide `MallocChecker`, but expose its guts like this. The change is fine, this is just a critique of the checker. Szelethus: I would prefer if this header file didn't exist, or was thought out better, because its messy…

	/// 'Sym' represents a pointer to the inner buffer of a container object.			/// 'Sym' represents a pointer to the inner buffer of a container object.
	/// This function looks up the memory region of that object in			/// This function looks up the memory region of that object in
	/// DanglingInternalBufferChecker's program state map.			/// DanglingInternalBufferChecker's program state map.
	const MemRegion *getContainerObjRegion(ProgramStateRef State, SymbolRef Sym);			const MemRegion *getContainerObjRegion(ProgramStateRef State, SymbolRef Sym);

	} // end namespace allocation_state			} // end namespace allocation_state

	} // end namespace ento			} // end namespace ento
	} // end namespace clang			} // end namespace clang

	#endif			#endif

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt

Show All 11 Lines	add_clang_library(clangStaticAnalyzerCheckers
BoolAssignmentChecker.cpp		BoolAssignmentChecker.cpp
BuiltinFunctionChecker.cpp		BuiltinFunctionChecker.cpp
CStringChecker.cpp		CStringChecker.cpp
CStringSyntaxChecker.cpp		CStringSyntaxChecker.cpp
CallAndMessageChecker.cpp		CallAndMessageChecker.cpp
CastSizeChecker.cpp		CastSizeChecker.cpp
CastToStructChecker.cpp		CastToStructChecker.cpp
CastValueChecker.cpp		CastValueChecker.cpp
		cert/StrChecker.cpp
CheckObjCDealloc.cpp		CheckObjCDealloc.cpp
CheckObjCInstMethSignature.cpp		CheckObjCInstMethSignature.cpp
CheckSecuritySyntaxOnly.cpp		CheckSecuritySyntaxOnly.cpp
CheckSizeofPointer.cpp		CheckSizeofPointer.cpp
CheckerDocumentation.cpp		CheckerDocumentation.cpp
ChrootChecker.cpp		ChrootChecker.cpp
CloneChecker.cpp		CloneChecker.cpp
ConversionChecker.cpp		ConversionChecker.cpp
▲ Show 20 Lines • Show All 93 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp

Show First 20 Lines • Show All 3,368 Lines • ▼ Show 20 Lines	if (!RS.isEmpty()) {
}		}
}		}
}		}

namespace clang {		namespace clang {
namespace ento {		namespace ento {
namespace allocation_state {		namespace allocation_state {

		std::unique_ptr<BugReporterVisitor> getMallocBRVisitor(SymbolRef Sym) {
		return std::make_unique<MallocBugVisitor>(Sym);
		}

ProgramStateRef		ProgramStateRef
markReleased(ProgramStateRef State, SymbolRef Sym, const Expr *Origin) {		markReleased(ProgramStateRef State, SymbolRef Sym, const Expr *Origin) {
AllocationFamily Family = AF_InnerBuffer;		AllocationFamily Family = AF_InnerBuffer;
return State->set<RegionState>(Sym, RefState::getReleased(Family, Origin));		return State->set<RegionState>(Sym, RefState::getReleased(Family, Origin));
}		}

} // end namespace allocation_state		} // end namespace allocation_state
} // end namespace ento		} // end namespace ento
Show All 35 Lines

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp

This file was added.

				//===- CERTStrChecker - Checker for CERT STR rules --------------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines CERTStrChecker which tries to find and resolve the CERT
				// string handler function issues with fix-it hints.
				//
				// The rules can be found in section 'Rule 07. Characters and Strings (STR)':
				// https://wiki.sei.cmu.edu/confluence/pages/viewpage.action?pageId=87152038
				//
				//===----------------------------------------------------------------------===//

				#include "AllocationState.h"
				#include "clang/ASTMatchers/ASTMatchFinder.h"
				#include "clang/Lex/Lexer.h"
				#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
				#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
				#include "clang/StaticAnalyzer/Core/Checker.h"
				#include "clang/StaticAnalyzer/Core/CheckerManager.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h"
				#include "llvm/ADT/Optional.h"
				#include <utility>

				using namespace clang;
				using namespace ento;
				using namespace ast_matchers;

				namespace {

				struct CallContext {
				CallContext(Optional<unsigned> DestinationPos)
				: DestinationPos(DestinationPos) {}

				Optional<unsigned> DestinationPos;
				};

				class CERTStrChecker
				: public Checker<eval::Call, check::RegionChanges, check::BeginFunction> {
				using StrCheck = std::function<void(const CERTStrChecker *, const CallEvent &,
				const CallContext &, CheckerContext &)>;

				public:
				bool evalCall(const CallEvent &Call, CheckerContext &C) const;
				void checkBeginFunction(CheckerContext &C) const;
				ProgramStateRef
				checkRegionChanges(ProgramStateRef State, const InvalidatedSymbols *,
				ArrayRef<const MemRegion *> ExplicitRegions,
				ArrayRef<const MemRegion *> Regions,
				const LocationContext LCtx, const CallEvent Call) const;

				std::unique_ptr<BugType> BT;
				mutable bool UseSafeFunctions;

				const CallDescriptionMap<std::pair<StrCheck, CallContext>> CDM = {
				// The following checks STR31-C rules.
				// char gets(char dest);
				{{"gets", 1}, {&CERTStrChecker::evalGets, {0}}}};

				void evalGets(const CallEvent &Call, const CallContext &CallC,
				CheckerContext &C) const;
				};
				} // namespace

				//===----------------------------------------------------------------------===//
				// Helper functions.
				//===----------------------------------------------------------------------===//

				// Returns the proper token based end location of \p E.
				static SourceLocation exprLocEnd(const Expr *E, CheckerContext &C) {
				assert(E);
				return Lexer::getLocForEndOfToken(E->getEndLoc(), /Offset=/0,
				C.getSourceManager(), C.getLangOpts());
				}

				// Returns a string representation of \p E.
				static std::string exprToStr(const Expr *E, CheckerContext &C) {
				assert(E);
				return Lexer::getSourceText(
				CharSourceRange::getTokenRange(E->getSourceRange()), C.getSourceManager(),
				C.getLangOpts());
				}

				static Optional<std::string> getDestSizeAsString(const CallEvent &Call,
				const CallContext &CallC,
				CheckerContext &C) {
				SVal DestV = Call.getArgSVal(*CallC.DestinationPos);
				const MemRegion *DestMR = DestV.getAsRegion();

				// Arrays.
				if (const auto *ER = dyn_cast<ElementRegion>(DestMR)) {
				// Dependent-sized array type or a member array.
				if (const auto *FR = dyn_cast<FieldRegion>(ER->getSuperRegion()))
				if (FR->getDecl()->getType()->isArrayType())
				if (const Expr ArrayExpr = Call.getArgExpr(CallC.DestinationPos))
				return "sizeof(" + exprToStr(ArrayExpr, C) + ")";

				// Constant or variable array type.
				if (const auto *VR = dyn_cast<VarRegion>(ER->getSuperRegion()))
				if (const ValueDecl *VD = VR->getDecl())
				if (VD->getType()->isArrayType())
				return "sizeof(" + VD->getNameAsString() + ")";
				}

				// 'malloc()' family functions, 'alloca()' and new[].
				ProgramStateRef State = C.getState();
				SValBuilder &SVB = C.getSValBuilder();
				if (const MemRegion *BaseMR = DestMR->getBaseRegion()) {
				if (const DynamicSizeInfo *SizeInfo = getDynamicSizeInfo(State, BaseMR)) {
				if (SizeInfo->isValid()) {
				if (const Expr *SizeExpr = SizeInfo->getSizeExpr())
				return exprToStr(SizeExpr, C);
				} else {
				DefinedOrUnknownSVal Size = SizeInfo->getSize();
				if (const llvm::APSInt *IntValue = SVB.getKnownValue(State, Size))
				return Twine(IntValue->getZExtValue()).str();
				}
				}
				}

				NoQUnsubmitted Done Reply Inline Actions Again, you will have to highlight the allocation site with a note. Therefore you will have to write a bug visitor that traverses the size expression at some point (or, equivalently, a note tag when the size expression is evaluated). Therefore you don't need to store the expression in the program state. NoQ: Again, you will have to highlight the allocation site with a note. Therefore you will have to…
				CharussoAuthorUnsubmitted Done Reply Inline Actions Yes, you have pointed out the necessary visitor, but it needs more thinking. I have a memory region which could be any kind of "memory block region" therefore I have no idea where is the size expression. We are supporting ~20 different allocations, which is nothing compared to the wild with the not so uncommon 5+ parameter allocators. Therefore I still do not want to reverse engineer a small MallocChecker + ExprEngine + BuiltinFunctionChecker inside my checker. They provide the necessary `DynamicSizeInfo` easily, which could be used in at least 4 checkers at the moment (which I have pointed out earlier in D68725). If I have the size expression in the dynamic size map, and I can clearly point out the destination buffer, it is a lot more simplified to traverse the graph where the buffer and its size comes from. Charusso: Yes, you have pointed out the necessary visitor, but it needs more thinking. I have a memory…
				CharussoAuthorUnsubmitted Done Reply Inline Actions Well, you really do not want to store `SizeExpr` of `malloc(SizeExpr)` and you are right I will have to traverse from it to see whether the `SizeExpr` is ambiguous or not, where it comes from. I want to rely on the `trackExpressionValue()` as the `SizeExpr` is available by `getDynamicSizeExpr()`, so it is one or two lines of code. Would you create your own switch-case to see where is the size expression goes in the allocation and use `trackExpressionValue()` on it? So that you do not store information in the global state which results in better run-time / less memory. At first I really wanted to model `malloc()` and `realloc()` and stuff, then I realized the `MallocChecker` provides every information I need. Would it be a better idea to create my own tiny `MallocChecker` inside my checker which does nothing but marks the size expression being interesting with `NoteTags`? Also I am thinking of a switch-case on the `DefinedOrUnknownSVal Size` which somewhere has an expression inside it which I could `trackExpressionValue()` on. Basically we are missing the rules what to use and I have picked the easiest solution. Could you share please which would be the right direction for such a simple task? Charusso: Well, you really do not want to store `SizeExpr` of `malloc(SizeExpr)` and you are right I will…
				NoQUnsubmitted Done Reply Inline Actions I want to rely on the `trackExpressionValue()` as the `SizeExpr` is available by `getDynamicSizeExpr()`, so it is one or two lines of code. This won't work. `trackExpressionValue()` can only track an active expression (that has, or at least should have, a value in the bug-node's environment). You'll have to make a visitor or a note tag. You can either make your own visitor (which will detect the node in which the extent information becomes present), or convert `MallocChecker` to use note tags and then inter-operate with those tags (though the interestingness map - "i mark the symbol as interesting so i'm interested in highlighting the allocation site" - or a similar mechanism). The second approach is more work because no such interoperation has ever been implemented yet, but it should be pretty rewarding for the future. NoQ: > I want to rely on the `trackExpressionValue()` as the `SizeExpr` is available by…
				CharussoAuthorUnsubmitted Done Reply Inline Actions This won't work. trackExpressionValue() can only track an active expression (that has, or at least should have, a value in the bug-node's environment). You'll have to make a visitor or a note tag. So because most likely after the `malloc()` the `size` symbol dies, the `trackExpressionValue()` cannot track dead symbols? Because we could make the `size` dying base on the `buffer`, we have some dependency logic for that. It also represents the truth, the size is part of that memory block's region. After that we could track the expression of the `size`? Charusso: > This won't work. trackExpressionValue() can only track an active expression (that has, or at…
				NoQUnsubmitted Done Reply Inline Actions So because most likely after the malloc() the size symbol dies...? After the `malloc()` is consumed, the size expression dies and gets cleaned up from the Environment. The symbol will only die if the value wasn't put into the Store in the process of modeling the statement that consumes the `malloc()` expression (such as an assignment). But `trackExpressionValue()` can only track live (active) expressions. NoQ: > So because most likely after the malloc() the size symbol dies...? After the `malloc()` is…
				CharussoAuthorUnsubmitted Done Reply Inline Actions I see. Now I have tried out what we have. The `trackExpressionValue()` has a lookup to see where is the expression available: /// Find the ExplodedNode where the lvalue (the value of 'Ex') /// was computed. static const ExplodedNode* findNodeForExpression(const ExplodedNode N, const Expr Inner) { while (N) { if (N->getStmtForDiagnostics() == Inner) return N; N = N->getFirstPred(); } return N; } from that point the expression was alive, and tracking is fine. The `InnerPointerChecker` has introduced a place: `AllocationState.h` to communicate with the `MallocBugVisitor`. I believe this is the simplest way to communicate. Charusso: I see. Now I have tried out what we have. The `trackExpressionValue()` has a lookup to see…
				return None;
				}

				std::unique_ptr<PathSensitiveBugReport> getReport(BugType &BT,
				const CallEvent &Call,
				const CallContext &CallC,
				CheckerContext &C) {
				SmallString<128> Msg;
				llvm::raw_svector_ostream Out(Msg);
				Out << '\'' << Call.getCalleeIdentifier()->getName()
				<< "' could write outside of ";

				const Expr DestExpr = Call.getArgExpr(CallC.DestinationPos);
				static constexpr llvm::StringLiteral ArrayName = "array";

				auto DRE = declRefExpr().bind(ArrayName);
				auto ME = memberExpr().bind(ArrayName);

				auto Matches =
				match(expr(anyOf(ME, DRE, hasDescendant(ME), hasDescendant(DRE))),
				*DestExpr, C.getASTContext());

				if (Matches.size() == 1)
				Out << '\'' << exprToStr(Matches[0].getNodeAs<Expr>(ArrayName), C) << '\'';
				else
				Out << "the array";

				auto Report = std::make_unique<PathSensitiveBugReport>(
				BT, Out.str(), C.generateNonFatalErrorNode());

				// Track the allocation.
				SVal DestV = Call.getArgSVal(*CallC.DestinationPos);
				Report->addVisitor(allocation_state::getMallocBRVisitor(DestV.getAsSymbol()));
				return Report;
				}

				//===----------------------------------------------------------------------===//
				// Rewrite decision helpfer functions.
				//===----------------------------------------------------------------------===//

				// We only want to fix simple buffers because when the buffer has an offset
				// the new size expression needs to be modified according to the offset.
				// FIXME: Use the 'SValVisitor' to make sure the offset is not present or
				// subtract the offset from the new size expression.
				bool isSimpleBuffer(const CallEvent &Call, const CallContext &CallC) {
				const Expr *DestArg =
				Call.getArgExpr(*CallC.DestinationPos)->IgnoreImpCasts();

				if (const auto *DRE = dyn_cast<DeclRefExpr>(DestArg)) {
				// Do not try to fix in-parameter buffers.
				return !isa<ParmVarDecl>(DRE->getDecl());
				}

				return isa<MemberExpr>(DestArg);
				}

				//===----------------------------------------------------------------------===//
				// Code injection functions.
				//===----------------------------------------------------------------------===//

				NoQUnsubmitted Done Reply Inline Actions Also, which is probably more important, you will never be able to provide a fixit for the malloced memory case, because there may be multiple execution paths that reach the current point with different size expressions (in fact, not necessarily all of them are malloced). Eg.: char x = 0; char y[10]; if (coin()) { x = malloc(20); } else { x = y; } gets(x); If you suggest replacing `gets(x)` with `gets_s(x, 20)`, you'll still have a buffer overflow on the else-branch on which `x` points to an array of 10 bytes. NoQ:* Also, which is probably more important, you will never be able to provide a fixit for the…
				CharussoAuthorUnsubmitted Done Reply Inline Actions This checker going to evolve a lot, and all of the checked function calls have issues like that. I do not even think what else issues they have. I would like to cover the false alarm suppression when we are about to alarm. Is it would be okay? I really would like to see alarms first. For example, I have seen stuff in the wild so that I can state out 8-param allocators and we need to rely on the checkers provide information about allocation. Charusso: This checker going to evolve a lot, and all of the checked function calls have issues like that.
				NoQUnsubmitted Done Reply Inline Actions summons @Szelethus Apart from the obviously syntactic cases, you might actually be able to implement fixits for the situation when the reaching-definitions analysis displays exactly one definition for `x`, which additionally coincides with the allocation site. If that definition is a simple assignment, you'll be able to re-run the reaching definitions analysis for the RHS of that assignment. If that definition comes from a function call, you might be able to re-run the reaching definitions analysis on the return statement(s) of that function (note that this function must have been inlined during path-sensitive analysis, otherwise no definition in it would coincide with the allocation site). And so on. This problem sheds some light on how much do we want to make the reaching definitions analysis inter-procedural. My current guess is that we probably don't need to; we'd rather have this guided by re-running the reaching-definitions analysis based on the path-sensitive report data, than have the reaching-definitions analysis be inter-procedural on our own. NoQ: summons @Szelethus Apart from the obviously syntactic cases, you might actually be able to…
				CharussoAuthorUnsubmitted Done Reply Inline Actions That is a cool idea! I hope @Szelethus has time for his project. Charusso: That is a cool idea! I hope @Szelethus has time for his project.
				SzelethusUnsubmitted Done Reply Inline Actions This sounds very cool! Once we're at the bug report construction phase, we can make reaching definitions analysis "interprocedural enough" for most cases, I believe. Szelethus: This sounds very cool! Once we're at the bug report construction phase, we can make reaching…
				static void renameFunctionFix(StringRef NewName, const CallEvent &Call,
				PathSensitiveBugReport &Report) {
				unsigned CallNameLength = Call.getCalleeIdentifier()->getLength();
				SourceLocation CallBegin = Call.getSourceRange().getBegin();
				SourceRange CallNameRange(CallBegin,
				CallBegin.getLocWithOffset(CallNameLength - 1));

				const auto FuncNameFix = FixItHint::CreateReplacement(CallNameRange, NewName);
				Report.addFixItHint(FuncNameFix);
				}

				//===----------------------------------------------------------------------===//
				// Evaluating problematic function calls.
				//===----------------------------------------------------------------------===//

				void CERTStrChecker::evalGets(const CallEvent &Call, const CallContext &CallC,
				CheckerContext &C) const {
				unsigned DestPos = *CallC.DestinationPos;
				const Expr *DestArg = Call.getArgExpr(DestPos)->IgnoreImpCasts();
				SVal DestV = Call.getArgSVal(DestPos);

				auto Report = getReport(*BT, Call, CallC, C);
				NoQUnsubmitted Done Reply Inline Actions All right, so basically what you're saying is that literally every invocation of `gets()` deserves a warning. This means that for all practical purposes your checker is an AST-based checker, just implemented with path-sensitive callbacks. A path-sensitive checker emits warnings based on multiple events that happen sequentially along the path (use-after-free: "memory deallocated - that same memory used", division by zero: "value constrained to zero - something is being divided by that same value", etc.) but your checker emits the warning by looking at only one statement: "`gets()` is invoked". Do i understand correctly that your plan is to use path-sensitive analysis for fixits only? But you can't emit fixits for any truly path-sensitive warning anyway. Fixits must work correctly on all execution paths, so you cannot emit a correct fixit by looking at only one execution path. In order to emit fixits, you need to either resort to a pure AST check anyway ("this expression refers to an array of fixed size"), or maybe implement auxiliary data-flow analysis for a certain must-problem (eg., "the buffer argument may have exactly one possible value across all paths that reach `gets()`"). But in both cases the path-sensitive engine does literally nothing to help you; all the data that you'll need for your fixit will be available from the AST. NoQ: All right, so basically what you're saying is that literally every invocation of `gets()`…
				NoQUnsubmitted Done Reply Inline Actions Like, i think this was an interesting investigation and i was genuinely curious about how this turns out to be, but for now it seems that the problem you're trying to solve cannot be solved this way. Path sensitive analysis is fundamentally applicable to only 50% of the problems (to "may-problems" but not to "must-problems"), and the problem you're trying to solve is in the latter category. I believe you'll have to fall back to the relatively boring task of adding fixits to `security.insecureAPI.gets`; but then, again, if you manage to employ use-def chains for this problem, that might be quite an inspiring start. NoQ: Like, i think this was an interesting investigation and i was genuinely curious about how this…
				CharussoAuthorUnsubmitted Done Reply Inline Actions Most of the time the given allocation to hold the arbitrary string happens in a local scope. After that we see `fscanf(dst)`, `gets(dst)`, `memcpy(dst)`, `strncpy(dst)`, stuff... which pushes new data into that memory block, and then the cool developers write that down: `dst[42] = '\0'` which means all the reports should be thrown away in a path-sensitive manner on `dst`. Reallocations, re-bindings, non-AST stuff could handled very easily with the non-AST checker, like that one. Sometimes we are work with destination-array like `memcpy(Foo[Bar.Baz]->Qux, ...)` which could not really handled with just a simple AST-based checker. I could not say at the moment we could handle it with symbols, but we have a much larger scope of information by symbols. Most of the time because of the Analyzer is much smarter than the Tidy we could emit fix-its with the help of flow-sensitiveness very easily. I would create huge white-lists what we want to fix-it, and what we could not, but at some point if we model the symbols better, we can. Other than that easy false-positive suppression and tiny flow-sensitive rebinding stuff, we could be sure what is going on by each string-manipulation. The `gets()` is a toy example where at most a `grep -rn 'gets('` could do better analysis than us. The real world looks like that: 1 encryptedpasswordlen = ((strlen(passwd) + RADIUS_VECTOR_LENGTH - 1) / RADIUS_VECTOR_LENGTH) * RADIUS_VECTOR_LENGTH; 2 cryptvector = palloc(strlen(secret) + RADIUS_VECTOR_LENGTH); 3 memcpy(cryptvector, secret, strlen(secret)); ... 4 for (i = 0; i < encryptedpasswordlen; i += RADIUS_VECTOR_LENGTH) { 5 memcpy(cryptvector + strlen(secret), md5trailer, RADIUS_VECTOR_LENGTH); ... from `postgresql/src/backend/libpq/auth.c` At `3` we would emit a warning, because the null-termination left by the wrong size of the string, but at `5` we see that, it left, because at that offset the string continues, and dunno, on `6` when we model every flow-sensitive information, the string left non-terminated. Of course each of that stuff is local and AST-based (with huge overhead of rebindings and impossible false-positive suppression), but when you have two of it, that is when the fun begins. Charusso: Most of the time the given allocation to hold the arbitrary string happens in a local scope.
				CharussoAuthorUnsubmitted Done Reply Inline Actions if you manage to employ use-def chains for this problem, that might be quite an inspiring start. We have regions so we do not need to rely on such chains in the AST-world, if I get your idea right by the "Use-define chain" wiki [1]. Btw. it is not that difficult problem in the AST-world, you need to create a recursive AST-matcher on the `DeclRefExpr` with `std::function`. Basically, I want to implement all the STR rules in a logical order, here is one of the examples from STR32-C [2] which is my last planned project at the moment: void lessen_memory_usage(void) { wchar_t temp; size_t temp_size; / ... / if (cur_msg != NULL) { temp_size = cur_msg_size / 2 + 1; temp = realloc(cur_msg, temp_size sizeof(wchar_t)); /* temp &and cur_msg may no longer be null-terminated / if (temp == NULL) { / Handle error / } cur_msg = temp; cur_msg_size = temp_size; cur_msg_len = wcslen(cur_msg); } } They really want to represent the wild, and please think of that problem in terms of an AST-checker versus in terms of `getAsRegion` and `getDynamicElementCount` to compare the size of the allocated memory block and inject that: `cur_msg[temp_size - 1] = L'\0';` because the array would overflow. How cool is the Analyzer and how smart to do so. It would took at most 10 minutes to implement if the `evalBinOp` would work or the main 10 years old implementation of obtaining the element-count would work. I am on the way to fixing the latter, but it will be more path-sensitive info, than you could imagine, like reusing the "zombie" size-expression turned out to be a hard problem. And it will be a lot easier to solve such problems, I believe. The local scope is the key, and that checker at the moment only tries to rewrite destination-arrays which are local. I think we could see if we emit multiple reports on a given call so due to ambiguity we would drop such fix-its. With an AST checker I could not imagine how difficult it would be. It is rather a research at the moment, because I have encountered dozens of silly stuff, beginning with the `getExtent()`, so I cannot say this direction is the 100% future, but I have picked the Analyzer over Tidy, for that reason, it is smarter. [1] https://en.wikipedia.org/wiki/Use-define_chain [2] https://wiki.sei.cmu.edu/confluence/display/c/STR32-C.+Do+not+pass+a+non-null-terminated+character+sequence+to+a+library+function+that+expects+a+string Charusso:* > if you manage to employ use-def chains for this problem, that might be quite an inspiring…
				NoQUnsubmitted Done Reply Inline Actions It would took at most 10 minutes to implement if the `evalBinOp` would work or the main 10 years old implementation of obtaining the element-count would work. You have multiple different checkers that you want to implement and for each of them there are two parts of the problem: (1) Emit the warning, (2) Emit a fixit for that warning. Path-sensitive analysis is perfect for (1), at least for some checkers (not for the one in this patch). But if you try to rely on path-sensitive analysis for (2), your result will simply be incorrect for the reason stated above: there may be other execution paths that you haven't taken into account. And when you say I would create huge white-lists what we want to fix-it , this literally means repeating a lot of the work you did in D45050, as you have to write down matchers (or a CFG-based data flow analysis) for the cases that you really can fix. At this point you will not only be aware of the allocation site from which you can extract the size-expression, but also of all other possible allocation sites. So i honestly believe that you should drop the idea of using path sensitive analysis to help you with fixits, and instead focus on the two more-or-less-independent tasks of (a) developing path-sensitive checkers for the CERT problems you're interested in and (b) developing fixits for such checkers in a syntactic manner without relying on the information obtained via the path-sensitive analysis. NoQ: > It would took at most 10 minutes to implement if the `evalBinOp` would work or the main 10…
				CharussoAuthorUnsubmitted Done Reply Inline Actions It would took at most 10 minutes to implement if the evalBinOp would work or the main 10 years old implementation of obtaining the element-count would work. You have multiple different checkers that you want to implement and for each of them there are two parts of the problem: (1) Emit the warning, (2) Emit a fixit for that warning. This string-handling misuse of the language C consists of thousands of entries across random open source projects, that is why I would like to suggest fix-its. With the help of path-sensitive stuff I would like prove the fix-its are being fine. Path-sensitive analysis is perfect for (1), at least for some checkers (not for the one in this patch). But if you try to rely on path-sensitive analysis for (2), your result will simply be incorrect for the reason stated above: there may be other execution paths that you haven't taken into account. I know that the path-sensitive analysis means that: "There is a path". But I also know that there is one single function body where the `(allocation, string manipulation, return data)` triplet takes place. Here we should take every of the execution paths because we are in a local scope. We could be 100% sure when the obtained size-expression's region is not safe to reuse. I do not think about function-boundaries because the peoples write code like that triplet. Sometimes the size-expression is coming from a function-call, but that should be fine to obtain. And when you say I would create huge white-lists what we want to fix-it , this literally means repeating a lot of the work you did in D45050, as you have to write down matchers (or a CFG-based data flow analysis) for the cases that you really can fix. At this point you will not only be aware of the allocation site from which you can extract the size-expression, but also of all other possible allocation sites. There is one allocation site 99,99% of the time, where the 0,01% is your counter example in this review. Of course there must be such case in the wild, somewhere, deep in the woods, and of course the most error-prone case is the 0,01%, but the other 99,99% has the same issue, and we could provide fix-its easily if we do not focus mostly on the 0,01%. I want to make my stuff non-alpha, even none of the non-alpha checkers or not the Tidy can provide 100% accuracy, that stuff with fix-its should be 100% accurate. So in case of the 0,01% we detect it and we do not fix-it, that is it. And we are hoping someone creates summary-based analysis, so that we can rewrite the 0,01% as well. So i honestly believe that you should drop the idea of using path sensitive analysis to help you with fixits, and instead focus on the two more-or-less-independent tasks of (a) developing path-sensitive checkers for the CERT problems you're interested in and (b) developing fixits for such checkers in a syntactic manner without relying on the information obtained via the path-sensitive analysis. Plot twist: nor the AST-checkers and nor the path-sensitive-checkers could solve this issue. You cannot state out that my approach is bad, so I should change it. I cannot state out my approach is good, so we should go for it. I believe in that when I dropped my Tidy-career then I have picked the right tool which I can improve to create 100% accuracy and model the impossible-to-model stuffs, like that STR rules. Of course, in a path-sensitive manner, to drop every heuristic, to model everything which necessary with offsets and allocations, and for false positive suppression. The key here, to modify the Analyzer from that statement "There is a path" to "There is only a path", according to the use-case of the string manipulation being a single path. Charusso: >> It would took at most 10 minutes to implement if the evalBinOp would work or the main 10…
				NoQUnsubmitted Not Done Reply Inline Actions There is one allocation site 99,99% of the time, where the 0,01% is your counter example in this review. It might be 0,01% for this checker (and if so, a simple AST-based solution will work equally well), but for other checkers there will be a lot more problems: like, how do you null-terminate a string if the size of the string depends on the execution path? The key here, to modify the Analyzer from that statement "There is a path" to "There is only a path", according to the use-case of the string manipulation being a single path. This is a possible approach to one of our open projects, "Implement a dataflow flamework", which would allow us to solve must-problems as well as may-problems. Like, it sounds easy: if we simply were able to figure out whether the analyzer has explored all paths or not in the current analysis, we will be able to solve may-problems in all cases in which the analyzer has actually explored all paths, by post-processing the `ExplodedGraph`. I've been advocating for this approach for some time in the past, but this approach is clearly a dead end because in most of the interesting cases (eg., in presence of any sort of loops) it'll reply "i don't know" ("we were clearly unable to explore all paths"). The more principled solution is to make an actual data flow framework in the usual sense (i.e., an API that would allow us to write "must-problem" checks in the same way as we currently write path-sensitive checks but with the extra join operation). I heard that @gribozavr has some plans in this area. NoQ: > There is one allocation site 99,99% of the time, where the 0,01% is your counter example in…
				CharussoAuthorUnsubmitted Not Done Reply Inline Actions how do you null-terminate a string if the size of the string depends on the execution path? Here is your example a little-bit modified: char x = 0; char y[10]; if (coin()) { x = malloc(20); } else { x = y; } char src[] = "Foo"; strcpy(x, src); Because the source string is only available after the allocation we cannot make sure the allocation could hold the entire source object which appropriate allocation would be: `malloc(strlen(src) + 1)` or something similar. Also the `free()` is another hard problem immediately. It is that 0,01% case which you do not encounter in the wild. The usual approach: having a memory block - fill it - return it. But the Static Analyzer made for that purpose, so it can detect such path-sensitive information and prevent to emit a fix-it. "Implement a dataflow flamework" That is a cool idea, but for the first time I want to think about local scopes. Within a single function body, that is the scope I can safely measure, I believe. If it turns out to be a good idea, we could improve on it step-by-step. I really wanted to measure my stuff, but every time when I have finished I had to rewrite my stuff near from scratch, so I cannot say this checker is somewhat special to rewrite half of the Analyzer. May it will be. I have learnt nothing from Tidy, but that: prove that the fix-it is valid, if it is, emit, otherwise throw away. The Tidy cannot measure such information, and that is where the flow-sensitive stuff kicks in. If our engine would be perfect, with dataflow-ctu-summary-dunnowhat analysis collaborating together may we would consider to rewrite such crazy difficult to rewrite code. I think we are that far away of that level, we have to avoid to think of that kind of fix-its. But the 99,99% totally worth and easy to measure with the help of dynamic size information, like I have implemented mostly of the STR31-C and STR32-C examples based on that idea with fix-its. I cannot emphasize enough well how simple is the way the string is manipulated and how easy it should be modeled with memory block regions or the dynamic "used size" of the memory block, and stuff like that. It is just a sequence of data-flow, from one given allocation to one given return, with the craziest offsets on the destination-array and with the craziest custom allocators. If you would like to cover more, we are not interested in multiple-buffers, but rather one single buffer with a custom allocator where the size expression should be easy to obtain. I have taken that heuristic and it worked out quite well. Like the Git's allocator injects the null terminator, they say, in a comment, safely - when they pass a wrong size to it: `strlen(src)`. So it would find possible false positives, but the commenting is not really the way we null-terminate. This is one of the most needed checkers, so I understand why you want to make it as precise as possible with the largest set of possibilities, but incremental development, eh? Peoples are lame and cannot count so they write `memcpy(dest, "Foobar", 6)`, where the next statements are telling us if the programmer was lame, or not (and on the previous line the `memcpy()` has an appropriate size expression). Please think about that kind of vulnerability when there is a Windows update every week. That was my base case when I started this project 2 years ago. Charusso:* > how do you null-terminate a string if the size of the string depends on the execution path?
				if (!isSimpleBuffer(Call, CallC)) {
				C.emitReport(std::move(Report));
				return;
				}

				if (Optional<std::string> SizeStr = getDestSizeAsString(Call, CallC, C)) {
				renameFunctionFix(UseSafeFunctions ? "gets_s" : "fgets", Call, *Report);

				std::string ArgumentFix = ", " + *SizeStr;
				if (!UseSafeFunctions)
				ArgumentFix += ", stdin";

				auto CallFix =
				FixItHint::CreateInsertion(exprLocEnd(DestArg, C), ArgumentFix);
				Report->addFixItHint(CallFix);

				// Track the allocation's size expression.
				constexpr llvm::StringLiteral ExprName = "expr";
				ProgramStateRef State = C.getState();
				const ExplodedNode *N = C.getPredecessor();
				const MemRegion *DestMR = DestV.getAsRegion()->getBaseRegion();

				if (const Expr *SizeExpr = getDynamicSizeExpr(State, DestMR)) {
				auto Matches = match(expr(forEachDescendant(expr().bind(ExprName))),
				*SizeExpr, C.getASTContext());
				for (const BoundNodes &Match : Matches)
				if (const Expr *E = Match.getNodeAs<Expr>(ExprName))
				bugreporter::trackExpressionValue(N, E, *Report);
				}
				}

				C.emitReport(std::move(Report));
				}

				//===----------------------------------------------------------------------===//
				// Main logic to evaluate a call.
				//===----------------------------------------------------------------------===//

				bool CERTStrChecker::evalCall(const CallEvent &Call, CheckerContext &C) const {
				if (const auto *Lookup = CDM.lookup(Call)) {
				const StrCheck &Check = Lookup->first;
				Check(this, Call, Lookup->second, C);
				}

				return false;
				}

				ProgramStateRef CERTStrChecker::checkRegionChanges(
				ProgramStateRef State, const InvalidatedSymbols *,
				ArrayRef<const MemRegion > ExplicitRegions, ArrayRef<const MemRegion >,
				const LocationContext LCtx, const CallEvent ) const {
				return invalidateDynamicSizeExpr(State, ExplicitRegions, LCtx);
				}

				void CERTStrChecker::checkBeginFunction(CheckerContext &C) const {
				// Check the preprocessor only when the analysis begins.
				if (!C.inTopFrame())
				return;

				if (!UseSafeFunctions)
				return;

				Preprocessor &PP = C.getPreprocessor();
				if (PP.isMacroDefined("__STDC_LIB_EXT1__")) {
				MacroDefinition MD = PP.getMacroDefinition("__STDC_WANT_LIB_EXT1__");
				if (const MacroInfo *MI = MD.getMacroInfo()) {
				const Token &T = MI->tokens().back();
				StringRef ValueStr = StringRef(T.getLiteralData(), T.getLength());
				llvm::APInt IntValue;
				ValueStr.getAsInteger(10, IntValue);
				UseSafeFunctions = IntValue.getZExtValue();
				return;
				}
				}

				UseSafeFunctions = false;
				}

				void ento::registerCERTStrChecker(CheckerManager &Mgr) {
				auto *Checker = Mgr.registerChecker<CERTStrChecker>();

				Checker->UseSafeFunctions = Mgr.getAnalyzerOptions().getCheckerBooleanOption(
				Checker, "WantToUseSafeFunctions");

				Checker->BT = std::make_unique<BugType>(
				Mgr.getCurrentCheckerName(), "Insecure string handler function call",
				categories::SecurityError);
				}

				bool ento::shouldRegisterCERTStrChecker(const LangOptions &) { return true; }

clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp

	//=--- CommonBugCategories.cpp - Provides common issue categories -- C++ --=//			//=--- CommonBugCategories.cpp - Provides common issue categories -- C++ --=//
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"			#include "clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h"

	// Common strings used for the "category" of many static analyzer issues.			// Common strings used for the "category" of many static analyzer issues.
	namespace clang { namespace ento { namespace categories {			namespace clang {
				namespace ento {
				namespace categories {

	const char * const CoreFoundationObjectiveC = "Core Foundation/Objective-C";			const char *const CoreFoundationObjectiveC = "Core Foundation/Objective-C";
	const char * const LogicError = "Logic error";			const char *const LogicError = "Logic error";
	const char * const MemoryRefCount =			const char *const MemoryRefCount =
	"Memory (Core Foundation/Objective-C/OSObject)";			"Memory (Core Foundation/Objective-C/OSObject)";
	const char * const MemoryError = "Memory error";			const char *const MemoryError = "Memory error";
	const char * const UnixAPI = "Unix API";			const char *const UnixAPI = "Unix API";
	const char * const CXXObjectLifecycle = "C++ object lifecycle";			const char *const CXXObjectLifecycle = "C++ object lifecycle";
	}}}			const char *const SecurityError = "SecurityError";

				} // namespace categories
				} // namespace ento
				} // namespace clang

clang/lib/StaticAnalyzer/Core/DynamicSize.cpp

//===- DynamicSize.cpp - Dynamic size related APIs --------------- C++ --===//		//===- DynamicSize.cpp - Dynamic size related APIs --------------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file defines APIs that track and query dynamic size information.		// This file defines APIs that track and query dynamic size information.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h"
#include "clang/AST/Expr.h"		#include "clang/AST/Expr.h"
		#include "clang/ASTMatchers/ASTMatchFinder.h"
#include "clang/Basic/LLVM.h"		#include "clang/Basic/LLVM.h"
		#include "clang/StaticAnalyzer/Core/PathSensitive/AnalysisManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/ProgramState.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/SValBuilder.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h"		#include "clang/StaticAnalyzer/Core/PathSensitive/SymbolManager.h"

REGISTER_MAP_WITH_PROGRAMSTATE(DynamicSizeMap, const clang::ento::MemRegion *,		REGISTER_MAP_WITH_PROGRAMSTATE(DynamicSizeMap, const clang::ento::MemRegion *,
clang::ento::DynamicSizeInfo)		clang::ento::DynamicSizeInfo)

namespace clang {		namespace clang {
namespace ento {		namespace ento {

		using namespace ast_matchers;

		const DynamicSizeInfo *getDynamicSizeInfo(ProgramStateRef State,
		const MemRegion *MR) {
		return State->get<DynamicSizeMap>(MR);
		}

ProgramStateRef setDynamicSize(ProgramStateRef State, const MemRegion *MR,		ProgramStateRef setDynamicSize(ProgramStateRef State, const MemRegion *MR,
DefinedOrUnknownSVal Size,		DefinedOrUnknownSVal Size,
const Expr *SizeExpr) {		const Expr *SizeExpr) {
return State->set<DynamicSizeMap>(MR, {Size, SizeExpr});		return State->set<DynamicSizeMap>(MR, {Size, SizeExpr});
}		}

DefinedOrUnknownSVal getDynamicSize(ProgramStateRef State, const MemRegion *MR,		DefinedOrUnknownSVal getDynamicSize(ProgramStateRef State, const MemRegion *MR,
SValBuilder &SVB) {		SValBuilder &SVB) {
Show All 22 Lines	SVal ElementSizeV = SVB.makeIntVal(
Ctx.getTypeSizeInChars(ElementTy).getQuantity(), SVB.getArrayIndexType());		Ctx.getTypeSizeInChars(ElementTy).getQuantity(), SVB.getArrayIndexType());

SVal DivisionV =		SVal DivisionV =
SVB.evalBinOp(State, BO_Div, Size, ElementSizeV, SVB.getArrayIndexType());		SVB.evalBinOp(State, BO_Div, Size, ElementSizeV, SVB.getArrayIndexType());

return DivisionV.castAs<DefinedOrUnknownSVal>();		return DivisionV.castAs<DefinedOrUnknownSVal>();
}		}

		ProgramStateRef
		invalidateDynamicSizeExpr(ProgramStateRef State,
		ArrayRef<const MemRegion *> ExplicitRegions,
		const LocationContext *LCtx) {
		constexpr llvm::StringLiteral ExprName = "expr";
		for (const auto &Elem : State->get<DynamicSizeMap>()) {
		const DynamicSizeInfo &SizeInfo = Elem.second;
		if (!SizeInfo.isValid())
		continue;

		const Expr *SizeExpr = SizeInfo.getSizeExpr();
		if (!SizeExpr)
		continue;

		auto Matches =
		match(expr(forEachDescendant(expr().bind(ExprName))), *SizeExpr,
		State->getAnalysisManager().getASTContext());

		for (const BoundNodes &Match : Matches)
		if (const Expr *E = Match.getNodeAs<Expr>(ExprName))
		if (const auto *DRE = dyn_cast<DeclRefExpr>(E))
		if (const auto *VD = dyn_cast<VarDecl>(DRE->getDecl()))
		for (const MemRegion *MR : ExplicitRegions)
		if (State->getRegion(VD, LCtx) == MR)
		State = State->set<DynamicSizeMap>(
		Elem.first, {SizeInfo.getSize(), SizeInfo.getSizeExpr(),
		/IsValid=/false});
		}

		return State;
		}

} // namespace ento		} // namespace ento
} // namespace clang		} // namespace clang

clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp

Show First 20 Lines • Show All 155 Lines • ▼ Show 20 Lines	auto reportPiece = [&](unsigned ID, FullSourceLoc Loc, StringRef String,
}		}
}		}
};		};

for (std::vector<const PathDiagnostic *>::iterator I = Diags.begin(),		for (std::vector<const PathDiagnostic *>::iterator I = Diags.begin(),
E = Diags.end();		E = Diags.end();
I != E; ++I) {		I != E; ++I) {
const PathDiagnostic PD = I;		const PathDiagnostic PD = I;
		const auto WarningPiece = PD->path.back();
reportPiece(WarnID, PD->getLocation().asLocation(),		reportPiece(WarnID, PD->getLocation().asLocation(),
PD->getShortDescription(), PD->path.back()->getRanges(),		PD->getShortDescription(), WarningPiece->getRanges(),
PD->path.back()->getFixits());		WarningPiece->getFixits());

// First, add extra notes, even if paths should not be included.		// First, add extra notes, even if paths should not be included.
for (const auto &Piece : PD->path) {		for (const auto &Piece : PD->path) {
if (!isa<PathDiagnosticNotePiece>(Piece.get()))		if (!isa<PathDiagnosticNotePiece>(Piece.get()))
continue;		continue;

reportPiece(NoteID, Piece->getLocation().asLocation(),		reportPiece(NoteID, Piece->getLocation().asLocation(),
Piece->getString(), Piece->getRanges(), Piece->getFixits());		Piece->getString(), Piece->getRanges(), Piece->getFixits());
}		}

if (!IncludePath)		if (!IncludePath)
continue;		continue;

// Then, add the path notes if necessary.		// Then, add the path notes if necessary.
PathPieces FlatPath = PD->path.flatten(/ShouldFlattenMacros=/true);		PathPieces FlatPath = PD->path.flatten(/ShouldFlattenMacros=/true);
for (const auto &Piece : FlatPath) {		for (const auto &Piece : FlatPath) {
if (isa<PathDiagnosticNotePiece>(Piece.get()))		if (isa<PathDiagnosticNotePiece>(Piece.get()))
continue;		continue;

		// Do not apply fix-its twice on the warning path piece.
		bool IsFixItDuplication = WarningPiece == Piece && ApplyFixIts;
		if (IsFixItDuplication)
		ApplyFixIts = false;

reportPiece(NoteID, Piece->getLocation().asLocation(),		reportPiece(NoteID, Piece->getLocation().asLocation(),
Piece->getString(), Piece->getRanges(), Piece->getFixits());		Piece->getString(), Piece->getRanges(), Piece->getFixits());

		if (IsFixItDuplication)
		ApplyFixIts = true;
}		}
}		}

if (!ApplyFixIts \|\| Repls.empty())		if (!ApplyFixIts \|\| Repls.empty())
return;		return;

Rewriter Rewrite(SM, LO);		Rewriter Rewrite(SM, LO);
if (!applyAllReplacements(Repls, Rewrite)) {		if (!applyAllReplacements(Repls, Rewrite)) {
▲ Show 20 Lines • Show All 702 Lines • Show Last 20 Lines

clang/test/Analysis/Inputs/system-header-simulator.h

	Show All 12 Lines
	extern FILE *stdin;			extern FILE *stdin;
	extern FILE *stdout;			extern FILE *stdout;
	extern FILE *stderr;			extern FILE *stderr;
	// Include a variant of standard streams that occur in the pre-processed file.			// Include a variant of standard streams that occur in the pre-processed file.
	extern FILE *__stdinp;			extern FILE *__stdinp;
	extern FILE *__stdoutp;			extern FILE *__stdoutp;
	extern FILE *__stderrp;			extern FILE *__stderrp;

				typedef __SIZE_TYPE__ size_t;

	int scanf(const char *restrict format, ...);			int scanf(const char *restrict format, ...);
	int fscanf(FILE restrict, const char restrict, ...);			int fscanf(FILE restrict, const char restrict, ...);
	int printf(const char *restrict format, ...);			int printf(const char *restrict format, ...);
	int fprintf(FILE restrict, const char restrict, ...);			int fprintf(FILE restrict, const char restrict, ...);
	int getchar(void);			int getchar(void);
				char gets(char buffer);
				char gets_s(char buffer, size_t size);
				char fgets(char str, int n, FILE *stream);

	// Note, on some platforms errno macro gets replaced with a function call.			// Note, on some platforms errno macro gets replaced with a function call.
	extern int errno;			extern int errno;

	typedef __typeof(sizeof(int)) size_t;			typedef __typeof(sizeof(int)) size_t;

	size_t strlen(const char *);			size_t strlen(const char *);

	▲ Show 20 Lines • Show All 73 Lines • ▼ Show 20 Lines
	#define INT64_MIN (-INT64_MAX-1)			#define INT64_MIN (-INT64_MAX-1)
	#define __DBL_MAX__ 1.7976931348623157e+308			#define __DBL_MAX__ 1.7976931348623157e+308
	#define DBL_MAX __DBL_MAX__			#define DBL_MAX __DBL_MAX__
	#ifndef NULL			#ifndef NULL
	#define __DARWIN_NULL 0			#define __DARWIN_NULL 0
	#define NULL __DARWIN_NULL			#define NULL __DARWIN_NULL
	#endif			#endif

	#define offsetof(t, d) __builtin_offsetof(t, d)			#define offsetof(t, d) __builtin_offsetof(t, d)
	No newline at end of file

clang/test/Analysis/analyzer-config.c

	Show First 20 Lines • Show All 80 Lines • ▼ Show 20 Lines
	// CHECK-NEXT: optin.osx.cocoa.localizability.NonLocalizedStringChecker:AggressiveReport = false			// CHECK-NEXT: optin.osx.cocoa.localizability.NonLocalizedStringChecker:AggressiveReport = false
	// CHECK-NEXT: optin.performance.Padding:AllowedPad = 24			// CHECK-NEXT: optin.performance.Padding:AllowedPad = 24
	// CHECK-NEXT: osx.NumberObjectConversion:Pedantic = false			// CHECK-NEXT: osx.NumberObjectConversion:Pedantic = false
	// CHECK-NEXT: osx.cocoa.RetainCount:CheckOSObject = true			// CHECK-NEXT: osx.cocoa.RetainCount:CheckOSObject = true
	// CHECK-NEXT: osx.cocoa.RetainCount:TrackNSCFStartParam = false			// CHECK-NEXT: osx.cocoa.RetainCount:TrackNSCFStartParam = false
	// CHECK-NEXT: prune-paths = true			// CHECK-NEXT: prune-paths = true
	// CHECK-NEXT: region-store-small-struct-limit = 2			// CHECK-NEXT: region-store-small-struct-limit = 2
	// CHECK-NEXT: report-in-main-source-file = false			// CHECK-NEXT: report-in-main-source-file = false
				// CHECK-NEXT: security.cert.Str:WantToUseSafeFunctions = true
	// CHECK-NEXT: serialize-stats = false			// CHECK-NEXT: serialize-stats = false
	// CHECK-NEXT: silence-checkers = ""			// CHECK-NEXT: silence-checkers = ""
	// CHECK-NEXT: stable-report-filename = false			// CHECK-NEXT: stable-report-filename = false
	// CHECK-NEXT: suppress-c++-stdlib = true			// CHECK-NEXT: suppress-c++-stdlib = true
	// CHECK-NEXT: suppress-inlined-defensive-checks = true			// CHECK-NEXT: suppress-inlined-defensive-checks = true
	// CHECK-NEXT: suppress-null-return-paths = true			// CHECK-NEXT: suppress-null-return-paths = true
	// CHECK-NEXT: track-conditions = true			// CHECK-NEXT: track-conditions = true
	// CHECK-NEXT: track-conditions-debug = false			// CHECK-NEXT: track-conditions-debug = false
	// CHECK-NEXT: unix.DynamicMemoryModeling:Optimistic = false			// CHECK-NEXT: unix.DynamicMemoryModeling:Optimistic = false
	// CHECK-NEXT: unroll-loops = false			// CHECK-NEXT: unroll-loops = false
	// CHECK-NEXT: widen-loops = false			// CHECK-NEXT: widen-loops = false
	// CHECK-NEXT: [stats]			// CHECK-NEXT: [stats]
	// CHECK-NEXT: num-entries = 95			// CHECK-NEXT: num-entries = 96

clang/test/Analysis/cert/str31-alloc.cpp

This file was added.

				// RUN: %check_analyzer_fixit %s %t \
				// RUN: -analyzer-checker=core,unix,security.cert.Str \
				// RUN: -analyzer-config security.cert.Str:WantToUseSafeFunctions=true \
				// RUN: -I %S

				// See the examples on the page of STR31:
				// https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator

				#include "../Inputs/system-header-simulator.h"

				void *malloc(size_t size);
				void free(void *memblock);
				#define alloca(size) __builtin_alloca(size)

				#define __STDC_LIB_EXT1__ 1
				#define __STDC_WANT_LIB_EXT1__ 1

				void test_ambiguous_parameter(char *buf) {
				if (gets(buf)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf'}}
				// CHECK-FIXES: if (gets(buf)) {}
				}

				// We cannot be sure why the offset set and the size would be different based
				// on the offset and its meaning.
				void test_offset() {
				char buff[13];
				if (gets(buff + 1)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buff'}}
				// CHECK-FIXES: if (gets(buff + 1)) {}
				}

				void test_malloc_known(size_t size) {
				char buf1 = (char )malloc(size);
				if (gets(buf1)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf1'}}
				// CHECK-FIXES: if (gets_s(buf1, size)) {}
				free(buf1);
				}

				void test_variable_array_ty(size_t size) {
				char buf2[size];
				if (gets(buf2)) {}
				NoQUnsubmitted Done Reply Inline Actions The fix is not correct. It should be `sizeof(buf3) - 1`, otherwise you still have a buffer overflow. NoQ: The fix is not correct. It should be `sizeof(buf3) - 1`, otherwise you still have a buffer…
				CharussoAuthorUnsubmitted Done Reply Inline Actions Good catch, thanks! I was really into the pretty-printing, we should not fix-it. Charusso: Good catch, thanks! I was really into the pretty-printing, we should not fix-it.
				// expected-warning@-1 {{'gets' could write outside of 'buf2'}}
				// CHECK-FIXES: if (gets_s(buf2, sizeof(buf2))) {}
				}

				void test_constant_array_ty(size_t size) {
				char buf3[13];
				if (gets(buf3)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf3'}}
				// CHECK-FIXES: if (gets_s(buf3, sizeof(buf3))) {}
				}

				template <typename T, size_t size>
				struct MyArray {
				T data[size];

				MyArray() { test_dependent_sized_array_ty(); }

				void test_dependent_sized_array_ty() {
				if (gets(data)) {}
				// expected-warning@-1 {{'gets' could write outside of 'data'}}
				// CHECK-FIXES: if (gets_s(data, sizeof(data))) {}
				}
				};

				void test_member() {
				MyArray<char, 13> buf4;
				if (gets(buf4.data)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf4.data'}}
				// CHECK-FIXES: if (gets_s(buf4.data, sizeof(buf4.data))) {}
				}

				void test_new(size_t size) {
				char *buf5;
				buf5 = new char[13];
				if (gets(buf5)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf5'}}
				// CHECK-FIXES: if (gets_s(buf5, 13)) {}
				delete[] buf5;
				}

				void test_alloca() {
				char buf6 = (char )alloca(13);
				if (gets(buf6)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf6'}}
				// CHECK-FIXES: if (gets_s(buf6, 13)) {}
				}

clang/test/Analysis/cert/str31-notes.cpp

This file was added.

				// RUN: %check_analyzer_fixit %s %t \
				// RUN: -analyzer-checker=core,unix,security.cert.Str \
				// RUN: -analyzer-config security.cert.Str:WantToUseSafeFunctions=true \
				// RUN: -analyzer-output=text -I %S

				// See the examples on the page of STR31:
				// https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator

				#include "../Inputs/system-header-simulator.h"

				// The following is not defined therefore the safe functions are unavailable.
				// #define __STDC_LIB_EXT1__ 1
				#define __STDC_WANT_LIB_EXT1__ 1

				void *malloc(size_t size);
				void free(void *memblock);

				void test_simple_size(unsigned size) {
				size = 13;
				// expected-note@-1 {{The value 13 is assigned to 'size'}}

				char buf = (char )malloc(size);
				// expected-note@-1 {{Memory is allocated}}

				if (gets(buf)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf'}}
				// expected-note@-2 {{'gets' could write outside of 'buf'}}

				// CHECK-FIXES: if (fgets(buf, size, stdin)) {}
				free(buf);
				}

				void test_size_redefinition() {
				unsigned size = 13;
				// expected-note@-1 {{'size' initialized to 13}}

				char buf = (char )malloc(size + 1);
				// expected-note@-1 {{Memory is allocated}}

				size = 42;

				if (gets(buf)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf'}}
				// expected-note@-2 {{'gets' could write outside of 'buf'}}

				// CHECK-FIXES: if (fgets(buf, 14, stdin)) {}
				free(buf);
				}

clang/test/Analysis/cert/str31-safe.cpp

This file was added.

				// RUN: %check_analyzer_fixit %s %t \
				// RUN: -analyzer-checker=core,unix,security.cert.Str \
				// RUN: -analyzer-config security.cert.Str:WantToUseSafeFunctions=true \
				// RUN: -I %S

				// See the examples on the page of STR31:
				// https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator

				#include "../Inputs/system-header-simulator.h"

				#define __STDC_LIB_EXT1__ 1
				#define __STDC_WANT_LIB_EXT1__ 1

				namespace test_gets_bad {
				#define BUFFER_SIZE 1024

				void func(void) {
				char buf[BUFFER_SIZE];
				if (gets(buf)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf'}}
				// CHECK-FIXES: if (gets_s(buf, sizeof(buf))) {}
				}
				} // namespace test_gets_bad

				namespace test_gets_good {
				enum { BUFFERSIZE = 32 };

				void func(void) {
				char buff[BUFFERSIZE];

				if (gets_s(buff, sizeof(buff))) {}
				}
				} // namespace test_gets_good

clang/test/Analysis/cert/str31-unsafe.cpp

This file was added.

				// RUN: %check_analyzer_fixit %s %t \
				// RUN: -analyzer-checker=core,unix,security.cert.Str \
				// RUN: -analyzer-config security.cert.Str:WantToUseSafeFunctions=true \
				// RUN: -I %S

				// See the examples on the page of STR31:
				// https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator

				#include "../Inputs/system-header-simulator.h"

				// The following is not defined therefore the safe functions are unavailable.
				// #define __STDC_LIB_EXT1__ 1
				#define __STDC_WANT_LIB_EXT1__ 1

				namespace test_gets_bad {
				#define BUFFER_SIZE 1024

				void func(void) {
				char buf[BUFFER_SIZE];
				if (gets(buf)) {}
				// expected-warning@-1 {{'gets' could write outside of 'buf'}}
				// CHECK-FIXES: if (fgets(buf, sizeof(buf), stdin)) {}
				}
				} // namespace test_gets_bad

				namespace test_gets_good {
				enum { BUFFERSIZE = 32 };

				void func(void) {
				char buff[BUFFERSIZE];

				if (fgets(buff, sizeof(buff), stdin)) {}
				}
				} // namespace test_gets_good

This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] CERTStrChecker: Model gets()AbandonedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 228675

clang/include/clang/Lex/Preprocessor.h

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td

clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h

clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h

clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h

clang/lib/StaticAnalyzer/Checkers/AllocationState.h

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp

clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp

clang/lib/StaticAnalyzer/Core/DynamicSize.cpp

clang/lib/StaticAnalyzer/Frontend/AnalysisConsumer.cpp

clang/test/Analysis/Inputs/system-header-simulator.h

clang/test/Analysis/analyzer-config.c

clang/test/Analysis/cert/str31-alloc.cpp

clang/test/Analysis/cert/str31-notes.cpp

clang/test/Analysis/cert/str31-safe.cpp

clang/test/Analysis/cert/str31-unsafe.cpp

[analyzer] CERTStrChecker: Model gets()
AbandonedPublic