This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
docs/analyzer/
-
analyzer/
5/5
checkers.rst
-
include/clang/
-
clang/
-
AST/
-
FormatStringParsing.h
-
StaticAnalyzer/Checkers/
-
Checkers/
10/10
Checkers.td
-
lib/
-
AST/
-
FormatString.cpp
-
FormatStringParsing.h
-
PrintfFormatString.cpp
-
ScanfFormatString.cpp
-
StaticAnalyzer/Checkers/
-
Checkers/
2/2
AllocationState.h
2/2
CMakeLists.txt
-
MallocChecker.cpp
-
cert/
17/17
StrChecker.cpp
-
test/Analysis/
-
Analysis/
-
Inputs/
-
system-header-simulator.h
-
cert/
-
str31-c-extras.cpp
8/8
str31-c-notes.cpp
4/4
str31-c.cpp

Differential D70411

[analyzer] CERT STR rule checkers: STR31-C
Needs ReviewPublic

Authored by Charusso on Nov 18 2019, 3:07 PM.

Download Raw Diff

Details

Reviewers

NoQ
baloghadamsoftware

Summary

This patch introduces a new experimental checker:
alpha.security.cert.str.31c

This checker is implemented based on the following rule:
STR31-C: Guarantee that storage for strings has sufficient space for
character data and the null terminator:

https://wiki.sei.cmu.edu/confluence/x/sNUxBQ

It warns on misusing the following functions:
strcpy(), gets(), fscanf(), sprintf().

Diff Detail

Event Timeline

Charusso created this revision.Nov 18 2019, 3:07 PM

Herald added subscribers: cfe-commits, dkrupp, donat.nagy and 7 others. · View Herald TranscriptNov 18 2019, 3:07 PM

Somehow the MallocBugVisitor is broken in the wild - it does not catch the allocation, and we need to support a lot more to call it non-alpha (see the FIXMEs). I do not want to spend time on fixing the visitor, but rather I would make this checker alpha, and move on. After a while when we have enough alarms we could see what could go wrong. The reports are not bad, I tried to make it super false-positive free.

@aaron.ballman the name became security.cert.str.31.c and we lack the support to invoke every C checker with str.*.c (as I know), but for now it should be fine.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h
18 ↗	(On Diff #229928)	That change is totally unrelated.

Charusso mentioned this in D69813: [analyzer] CERTStrChecker: Model gets().Nov 18 2019, 3:23 PM

NoQ added inline comments.Nov 19 2019, 3:58 PM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
280	Why is this a false positive? You're bringing in a completely brand-new machinery here, could you explain how it works and why do you need it?

In D70411#1750690, @Charusso wrote:

Somehow the MallocBugVisitor is broken in the wild - it does not catch the allocation, and we need to support a lot more to call it non-alpha (see the FIXMEs). I do not want to spend time on fixing the visitor, but rather I would make this checker alpha, and move on. After a while when we have enough alarms we could see what could go wrong. The reports are not bad, I tried to make it super false-positive free.

@aaron.ballman the name became security.cert.str.31.c and we lack the support to invoke every C checker with str.*.c (as I know), but for now it should be fine.

The name seems reasonable enough to me -- it at least has all the relevant information in it, which is a great start.

I think it would really help if you draw a state machine for the checker, like the ASCII-art thing in D70470; you don't need to spend a lot of time turning it into ASCII-art, a photo of a quick hand-drawn picture would be totally fine, because it's, first and foremost, for discussion :)

Added a report when the not null-terminated string is read.
Added comments.
Fixed some uninteresting nits.
No ASCII-art yet.

In D70411#1754356, @NoQ wrote:

I think it would really help if you draw a state machine for the checker, like the ASCII-art thing in D70470; you don't need to spend a lot of time turning it into ASCII-art, a photo of a quick hand-drawn picture would be totally fine, because it's, first and foremost, for discussion :)

Hm, the idea is cool, but my checker is not that complex, given that now I have added comments. Thanks, I will adjust the comments with some kind of drawing.

My main problem was that to create a NoteTag when the not null-terminated string is read, but it is after when we emit an error, so I could not emit a note. That is why it emits two different reports, and somehow I need to convert the function call evaluation warning to a note in case when a not null-terminated string is read. Do we have any plans with NoteTags to support the craziest checkers? What do you think about storing the reports in the program state?

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
280	Hm, yes, I wanted to add comments earlier, sorry. It is still wonky a little-bit, somehow I need to merge the two different errors.

Charusso added inline comments.Nov 20 2019, 5:21 PM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
171	We can do the opposite to see whether the destination array's type is an array, so that we do not need `getDynamicSizeExpr`, yay.

Remove the report storing map so we do not traverse backwards on the bug-path.
Use NoteTags instead of reports on problematic function calls.
Emit a report only if a not null-terminated string is read.
Store whether the string is null-terminated.

Now I have simplified the checker so we do not need any ASCII-art, I believe. Do we have any better logic than the current implementation to catch when the string is read?

Charusso added a parent revision: D69746: [analyzer] FixItHint: Apply and test hints with the Clang-Tidy's script.Nov 24 2019, 1:01 PM

Do not emit a report on re-binding a not null-terminated string, it may will be properly terminated (test added).
Catch the null-termination on symbolic level in checkBind().
Added a test for the necessity of SValVisitor in this checker.
Some refactor.

The false positive suppression by going backwards on the bug-path of stored reports was a very simple concept, which turned out to be a useless one. The rough idea was that to invalidate every report in a path, and every other report would be left because they are reachable from a different path and they are valid errors. So that:

void test_wonky_null_termination(const char *src) {
  char dest[128];
  strcpy(dest, src);
  dest[sizeof(dest) - 1] = '\0';
  do_something(dest); // no-warning
}

The above example null-terminates the string on every path, so we remove the report, and:

void test_may_not_every_path_null_terminated(const char *src) {
  char dest[128];
  strcpy(dest, src);
  if (strlen(src) > sizeof(dest)) {
    dest[sizeof(dest) - 1] = '\0';
  }
  do_something(dest);
  // expected-warning@-1 {{'dest' is not null-terminated}}
}

We say that the above example has a path where we cannot invalidate the report. Invalidating one single control-flow path I think should be safe and keeps every other report like the above examples. In general there is no such simple case exist because we check for the destination array being null so we encounter a state-split whatever we do. Also may it was a bad idea to rewrite the BugReporter for a simple checker.

This null-termination-by-hand made false positives because even the buffer could overflow, we stop up the flow, so the Vasa will not sink. I wanted to emit reports on problematic function calls, but changing the main logic to emit an error on the string-reading made the null-termination store-able. So that we could drop this backward path-traversing logic, that is why I have not answered yet. However, if we still want to emit an error on function-calls, and every of them, we need to be able to chain the calls to the same region, which made by the reports in the previous revisions. Also may we would use this report-storing idea later.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
928	In my mind the documentation is the CERT rule's page. Is it okay?
clang/test/Analysis/cert/str31-c-fp-suppression.cpp
15 ↗	(On Diff #231341)	Facepalm.

Charusso added a child revision: D70805: [analyzer] SValHasDescendant: Determine whether the SVal has an SVal descendant.Nov 28 2019, 1:40 AM

Tweaking.

I have picked curl as an example, because it has a measurable amount of reports. I still recommend to make this alpha, because the expression-tracking cannot track members, and peoples are using members everywhere.

str31-c.tar.gz868 KBDownload

It is made with CodeChecker:

--analyzers clangsa \
--disable default \
--enable core \
--enable unix \
--enable security.cert.str.StrCheckerBase \
--enable security.cert.str.31.c \
--saargs sa_args.txt

$ cat sa_args.txt

-Xclang -analyzer-config -Xclang silence-checkers="core;unix"

A while back I had a look into implementing some of the CERT checkers and my vague recollection from these attempts was that it's a bad idea to take CERT examples literally. Most of the CERT rules are explained in an informal natural language and it makes CERT great for demonstrating common mistakes that programmers make when they write in C/C++. Humans can easily understand the problem by reading CERT pages. But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!". This way your check does indeed pass the tests, but it doesn't make sense both from the rule's perspective (the rule never said that it's wrong to strcpy() into a fixed-size buffer, it only said that you should anyhow guarantee that the storage is sufficient) and from the user's perspective (you won't be able to reduce the gap between false positives and false negatives enough for the check to be useful, even with the subsequent whack-a-mole of adding more heuristics, because what the check is checking is relatively orthogonal to the actual problem you're trying to solve).

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
928	I'd prefer our own documentation that may link to the CERT page but should also explain, say, which parts of the rule are covered by the checker (even if it's "all of them", they may add more rules in the future), and probably some specifics of the checker's behavior if they aren't obvious from the rule.
clang/test/Analysis/cert/str31-c-fp-suppression.cpp
33–34 ↗	(On Diff #232198)	Mmm, no, it's not fine. The warning is saying something that's not correct. Even if `src` is not null-terminated, under the assumption that no out-of-bounds read UB occurs in the `strcpy()` call, `dest` is going to always be null-terminated and have a `strlen` of at most 127. And by continuing our analysis after `strcpy()`, we inform the user that we assume that no UB has happened on the current execution path yet. Traditionally checkers either warn immediately when they can detect an error, or assume that the error has not happened. For example, when we dereference a pointer, if the pointer is not known to be null, we actively assume that it's non-null. Similarly, if the `src` string is not null-terminated, we should warn at the invocation of `strcpy`; if it's not known whether it's null-terminated or not, we should actively assume that it's null-terminated at `strcpy`. Also, I usually don't agree with the statements like "This report helped us find a real bug, therefore it's a true positive". You can find a bug in literally any code if you stare at it long enough! I'm glad you found the bug anyway, but if the report is not pointing at the problem directly, we're not doing a great job.
51–53 ↗	(On Diff #232198)	Looks like a false positive. Consider the following perfectly correct (even if a bit inefficient) code that only differs from the current code only in names and context: void copy_and_test_if_short(const char src) { char dest[128]; strcpy(dest, src); warn_if_starts_with_foo(dest); } void copy_and_test(const char src) { if (strlen(src) < 64) copy_and_test_if_short(src); else copy_and_test_if_long(src); }
clang/test/Analysis/cert/str31-c-notes.cpp
23	I still believe that this should be the warning in this code. This is already broken code.
27–28	It is not wrong to free a buffer that isn't null-terminated. We shouldn't warn here.

In D70411#1769628, @Charusso wrote:

str31-c.tar.gz868 KBDownload

These reports seem to confirm my point. I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways. You most likely won't be able to enumerate all possible ways in which the user can ensure that the destination buffer size is sufficient.

content_encoding.c_5ae4f30ce29f14441139e7bbe20eeaaa.plist.html232 KBDownload

The source string is taken from a global table and therefore has known maximum size.

ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html344 KBDownload

The source string is an IP address and port, which has a known limit on the number of digits it can have.

imap.c_fd80e0804acd9e7ecb9c2483151625a9.plist.html265 KBDownload

The source string is a literal '*'.

tftp.c_eee38be8d783d25f2e733fa3740d13fc.plist.html254 KBDownload

There's an if-statement that checks that the storage size is sufficient.

tool_dirhie.c_87dd00a845a927c9b8ed587c6b314af1.plist.html208 KBDownload

The source string is a token (obtained via strtok) of a string that has the same size as the destination buffer.

Make the checker alpha.
Add docs.
Copy-paste @NoQ's test.

Please let us measure your examples at first to have a consensus what we would like to see from this checker.

In D70411#1769803, @NoQ wrote:

In D70411#1769628, @Charusso wrote:

str31-c.tar.gz868 KBDownload

These reports seem to confirm my point. I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways. You most likely won't be able to enumerate all possible ways in which the user can ensure that the destination buffer size is sufficient.

I believe I can enhance the checker with the help of NoteTags and give more information for the user why the buffer space is Not sufficient. The size property's upper-bound is not that difficult to measure, I think we can implement this. For now, let me analyze each report.

content_encoding.c_5ae4f30ce29f14441139e7bbe20eeaaa.plist.html232 KBDownload

The source string is taken from a global table and therefore has known maximum size.

imap.c_fd80e0804acd9e7ecb9c2483151625a9.plist.html265 KBDownload

The source string is a literal '*'.

When you encounter a member that is when the hand-waving begins. I have made a dump to see what we have. I have not seen any immediate way to connect the dynamic size and the members. For now we have that:
The source region is:
SymRegion{reg_$36<const char * SymRegion{derived_$35{conj_$32{int, LC7, S161233, #1},Element{encodings,0 S64b,const struct content_encoding_s *}}}.name>}
and the allocation's dynamic size is:
extent_$41{SymRegion{conj_$34{void *, LC7, S161233, #1}}}

My assumption is that, the size of a member is very arbitrary and we cannot track the member's allocation, so we even could drop every of these reports as being useless. Because I make this checker alpha I am fine with that assumption of members, and we definitely need better ideas.

ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html344 KBDownload

The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated. For example the VIM project has one single global macro to serve every allocation's size, and that is why the CERT rule made. If the attacker passes a 10k characters long string and you allocate 10k+1 characters for the '\0' you can work with the data safely. Of course you would need to check some upper bound, but that is another story. The key is, any arbitrary buffer size could be a problem. Here:

954 | size_t addrlen = INET6_ADDRSTRLEN > strlen(string_ftpport) // Assuming the condition is true
                       ? INET6_ADDRSTRLEN
                       : strlen(string_ftpport)

addrlen is an arbitrary macro size INET6_ADDRSTRLEN, which not necessarily could hold string_ftpport in strcpy(addr, string_ftpport);. As a human-being you know the most appropriate size, yes. But the program could be ill-formed very easily so we warn on the arbitrary-length path. Here you do not see something like addr[INET6_ADDRSTRLEN - 1] = '\0' so when the string is read you cannot be sure whether it is null-terminated. If line 954 would be evaluated to false, allocating calloc(addrlen + 1, 1) so when addrlen is strlen(string_ftpport) is fine of course, that is the appropriate size to store string_ftpport.

tftp.c_eee38be8d783d25f2e733fa3740d13fc.plist.html254 KBDownload

There's an if-statement that checks that the storage size is sufficient.

Well, this is the most suspicious report of all the time. Let me reconstruct what is going on:

513 | sbytes += tftp_option_add(state, sbytes,
                                (char *)state->spacket.data + sbytes, // warning
                                TFTP_OPTION_TSIZE);

My guess was that to change the check::PostCall to check::PreCall, but if I remember right the same report occurred. Here we return from the function call tftp_option_add, then we warn on its argument which already has been passed to the call. We should not warn when the array is passed to a function which we already have finished reading from. I have not seen any immediate solution, so I count this as an internal ~~bug~~ feature.

tool_dirhie.c_87dd00a845a927c9b8ed587c6b314af1.plist.html208 KBDownload

The source string is a token (obtained via strtok) of a string that has the same size as the destination buffer.

Let me simplify the report as there is no such feature:

111 | outlen = strlen(outfile);
112 | outdup = strdup(outfile);
116 | dirbuildup = malloc(outlen + 1);

125 | tempdir = strtok(outdup, PATH_DELIMITERS);
136 | if(outdup == tempdir)
138 |   strcpy(dirbuildup, tempdir);

Here we see both tempdir's and dirbuildup's size is based on strlen(outfile) but strtok() strikes in and we conjure a new symbol for dirbuildup so the connection is lost unfortunately.

Please let us brainstorming here.

In D70411#1769786, @NoQ wrote:

A while back I had a look into implementing some of the CERT checkers and my vague recollection from these attempts was that it's a bad idea to take CERT examples literally. Most of the CERT rules are explained in an informal natural language and it makes CERT great for demonstrating common mistakes that programmers make when they write in C/C++. Humans can easily understand the problem by reading CERT pages. But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

Well, if the member's allocation and the expression-value-tracking would not be hand-waving, such a simple checker could serve a lot. I see your point and you are right the rules are not made for tools. Our tool already knows most of the functionality what is needed for the STR rules. It is started as an experiment and it is working enough well, in my point of view. The member-value-tracking and the lack of standard function modeling of course affects this checker, but that issue affects every other checker as well. If we call them non-alpha, it could be even non-alpha.

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!".

Just for clarification, I made the entire MemoryBlockRegion stuff because I do not care how do you allocate a memory block (even a string is counted as that).

This way your check does indeed pass the tests, but it doesn't make sense both from the rule's perspective (the rule never said that it's wrong to strcpy() into a fixed-size buffer, it only said that you should anyhow guarantee that the storage is sufficient) and from the user's perspective (you won't be able to reduce the gap between false positives and false negatives enough for the check to be useful, even with the subsequent whack-a-mole of adding more heuristics, because what the check is checking is relatively orthogonal to the actual problem you're trying to solve).

There is not that much heuristic:

If you use the unsafe calls and before reading the string you manage to null-terminate it (inject '\0'), it would be a false positive.
If you have a blob of memory without considering the length of the stuff it will store, it is a true positive.

Please note that, truncation is fine:

To prevent such errors, either limit copies through truncation or, preferably, ensure that the destination is of sufficient size

~ from the rule's page

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

This checker indeed would require an additional taint analysis, but as we do not have a non-alpha variant, I am fine to call every non-null-terminated string reading an issue (as it is), whatever is the source. That is a generalization, yes, in the best possible manner what we have got today. Also I am not a big fan of relying on non-alpha stuff and I am sure I have defined my checker enough well to catch real issues.

Let me summarize:

There is a lot more to do, we have seen too many issues to call this a finished checker.
The problem is not the size, but the missing '\0' (which you can have multiple of at any point).
We heavily need to swap the value-tracking with NoteTags to make this useful.

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
33–34 ↗	(On Diff #232198)	Well, I think we could check the range info and catch if we have an infeasible case. For now on one of the paths it looks like the string is not null-terminated, where we warn. Of course we need better modeling, that is why I have stated out with a FIXME. I totally agree with hand-waving "true positives", but this is very unlikely to happen, I just came out with that example to state out the `CStringModelingChecker` is not enough in its current shape.
51–53 ↗	(On Diff #232198)	That is why I have asked whether we have a way to say "when the string is read". I like that heuristic for now, because any kind of not null-terminated string is the root of the evil for STR31-C. In this case the attacker is the programmer who sends a possibly not null-terminated string around and the function which receives an arbitrary input would carry the issue. Thanks for the example, I wanted to cover ranges, but I have forgotten it, because in the wild peoples do not check for bounds. Copy-pasted.
clang/test/Analysis/cert/str31-c-notes.cpp
23	No, it is not broken, sadly. I would say it is broken, but this checker tries to be smarter than `grep`. Truncation is fine for STR31-C. That change in my mindset made this checker possible when I have dropped the idea of warn on function calls.
27–28	That is when the string is read. We could have a list of function calls which are non-problematic. For now I am fine with that note, because you do not allocate something without reading it before freeing it. It is just an arbitrary test for now.

Charusso edited the summary of this revision. (Show Details)Dec 6 2019, 4:15 PM

Ok, so, what should the checker warn on? What should be the intended state machine for this checker?

We basically have two possible categories of answers to this question:

"The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.
Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.

The problem is not the size, but the missing '\0' (which you can have multiple of at any point).

The caption of the CERT rule suggests that both of these are problematic.

In D70411#1773703, @Charusso wrote:

In D70411#1769786, @NoQ wrote:

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!".

Just for clarification, I made the entire MemoryBlockRegion stuff because I do not care how do you allocate a memory block (even a string is counted as that).

Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.

In D70411#1773703, @Charusso wrote:

ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html344 KBDownload

The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated.

It does, because strcpy is guaranteed to null-terminate as long as it has enough storage capacity.

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

This checker indeed would require an additional taint analysis, but as we do not have a non-alpha variant, I am fine to call every non-null-terminated string reading an issue (as it is), whatever is the source. That is a generalization, yes, in the best possible manner what we have got today. Also I am not a big fan of relying on non-alpha stuff and I am sure I have defined my checker enough well to catch real issues.

Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

We heavily need to swap the value-tracking with NoteTags to make this useful.

Before i forget, i'm opposed to explaining problems as notes. We should emit a warning as soon as we've reached far enough on the execution path to conclude that there's a problem.

clang/test/Analysis/cert/str31-c-notes.cpp
23	`Truncation is fine for STR31-C.` This isn't just truncation, this is buffer overflow, and it's definitely not fine if we want to "Guarantee that storage for strings has sufficient space for character data". This is absolutely the right place to simply warn on every invocation of `gets()`.
27–28	That is when the string is read. That strikes me as an example of the Maslow's hammer. The static analyzer isn't great at guessing whether the function reads from the string or not. For a dynamic analysis it wouldn't have been a problem. That said, maybe you could get away with discriminating between `char ` and `const char `. If the string is passed in by const pointer, there's nothing you can do except read from it. If it's passed in by non-const pointer, you can conservatively assume that it isn't read.

NoQ added inline comments.Dec 9 2019, 5:56 PM

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
925	Maybe `31c`? I'm afraid a dot in package name would be confusing and people will actually try something like to enable/disable `cert.str.31`.
930	`alpha.cert.str`.
clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
23 ↗	(On Diff #232671)	These strings are visible to humans, please write some proper English :)
clang/test/Analysis/cert/str31-c-fp-suppression.cpp
51–53 ↗	(On Diff #232198)	Note that function `copy_and_test` in my example would not be necessarily visible to you. It may be in another translation unit. So in order to "cover ranges" you'll have to discard all cases where the length is completely unknown.

Fix.

In D70411#1776460, @NoQ wrote:

Ok, so, what should the checker warn on? What should be the intended state machine for this checker?

We basically have two possible categories of answers to this question:

"The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.

Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.

If I am right you are thinking about warn on the unsafe function calls which is the first case. I did that on by default. The idea is that, in a security manner we cannot allow hand-waving fixes. To measure stuff we could introduce options which serves the second case. I have added an option to suppress that common null-termination-by-hand idiom, so that I do not recommend anything. The CERT rules are not recommendations so I think now the warning is fine.

Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.

I see, and you are right, but we are getting closer and closer to catch the most of.

In D70411#1773703, @Charusso wrote:

ftp.c_8cb07866de61ef56be82135a4d3a5b7e.plist.html344 KBDownload

The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated.

It does, because strcpy is guaranteed to null-terminate as long as it has enough storage capacity.

It does not, because the storage capacity is arbitrary. If we would be sure the copied stuff's length cannot be larger than the destination's arbitrary capacity, we would not warn. There are tests for that case.

Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

I think it is too far away, sadly. I like the idea because it would be the root of security checkers, but I am mostly interested in data-flow analysis.

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
51–53 ↗	(On Diff #232198)	That unknowness idea is cool, I will leave that comment as not `Done`.
clang/test/Analysis/cert/str31-c-notes.cpp
23	I have found out we have another recommendation for that type of issues: https://wiki.sei.cmu.edu/confluence/display/c/STR03-C.+Do+not+inadvertently+truncate+a+string Here the truncation is a truncation, but in a security manner we cannot rely on feelings, so we warn from now.
27–28	Cool idea, thanks!

Charusso mentioned this in D71433: [analyzer] CERT: POS34-C.Dec 13 2019, 4:33 AM

Iterate over parameters rather arguments for searching string-reading.
Better notes of the argument's index.
Avoid C.getPredecessor(), use the error-node instead.

Charusso removed a parent revision: D69746: [analyzer] FixItHint: Apply and test hints with the Clang-Tidy's script.Jan 30 2020, 10:36 AM

2020-ify the checker writing
Remove extra bullet-points of CERT checker documentation: we only need the checker's documentation, not the packages.

Charusso added a parent revision: D69726: [analyzer] DynamicSize: Store the dynamic size.Jan 31 2020, 9:17 AM

whisperity added a subscriber: whisperity.Mar 23 2020, 3:08 AM

Herald added subscribers: ASDenysPetrov, martong, steakhal. · View Herald TranscriptMar 23 2020, 3:08 AM

balazske added a subscriber: balazske.Mar 23 2020, 4:27 AM

balazske added inline comments.

clang/docs/analyzer/checkers.rst
1983	There are already more checkers that can check for CERT related problems but not specially made for these. These checkers do not reside in this new `cert` group. And generally a checker does not check for specifically a CERT rule, instead for more of them or other things too, or more checkers can detect a single rule. (And the user can think that only these CERT rules are checkable that exist in this package, that is not true.) So I do not like the introduction of this new `cert` package. (The documentation of existing checkers lists if the checker is designed for a CERT rule.)
clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
22 ↗	(On Diff #241752)	Are there already not other checkers that find security related bugs (the taint checker?)? Why do these not use a `SecurityError`? It is not bad to have a `SecurityError` but maybe there is a reason why was it not there already. If these categories are exclusive it is hard to find out what problem (probably already existing bug type in other checkers) belongs to what category (it can be for this checker `UnixAPI` or `MemoryError` too?).

Szelethus added inline comments.Mar 23 2020, 5:47 AM

clang/docs/analyzer/checkers.rst
1983	I disagree to some extent. I think it would be great to have a `cert` package that houses all checkers for each of the rules with the addition of checker aliases. Clang-tidy has something similar as well!

Thanks for the feedback! Given that it will remain an alpha checker for a long time (~1 year), no one really should use it.

clang/docs/analyzer/checkers.rst
1983	I designed the checker only for the rule STR31-C that is why I have picked package `cert`. Clang-Tidy could aliasing checks. For example the check could be in the `bugprone` category aliased to `cert` and both could trigger the analysis. the user can think that only these CERT rules are checkable We only need to move `security.FloatLoopCounter` under the `cert` package. What else SEI CERT rules are finished off other than package `cert`? It is not my fault if someone could not package well or could not catch most of the issues of a SEI CERT rule or could not reach the SEI CERT group to note the fact the Analyzer catch a very tiny part of a rule. However, this patch package well and could catch most of the STR31-C rule.
clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
22 ↗	(On Diff #241752)	We have only three pure security checkers: `security.insecureAPI`, `security.FloatLoopCounter` (FLP30-C, FLP30-CPP) and `alpha.security.cert.pos.34c`. The non-alpha checkers are very old, but Ted definitely made that category: const char *bugType = "Floating point variable used as loop counter"; BR.EmitBasicReport(bugType, "Security", os.str().c_str(), FS->getLocStart(), ranges.data(), ranges.size()); https://github.com/llvm/llvm-project/commit/9c49762776b4d2cb16911dec0c873c90a6508968 Every other security-ish checker does not catch the CERT rule's examples. May if the checker evolves I would pick MemoryError, but it will not evolve soon.

balazske added inline comments.Mar 24 2020, 10:04 AM

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
930	This text may be hard to understand. The option means "if it is off, the problem report happens only if there is no hand-written null termination of the string"? I looked at the CERT rule but did not find out why is null termination by hand important here. (I did not look at the code to find out what it does.) And "WarnOnCall" suggests that there is warning made on calls if on, or not at all or at other location if off, is this correct?
clang/lib/StaticAnalyzer/Checkers/AllocationState.h
28	This documentation could be improved or left out.
clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗	(On Diff #241752)	Maybe I have wrong expectations about what this checker does but did understand it correctly yet. The main problem here is that `dest` may be not large enough (if size of `src` is not known). This would be a better text for the error message? And if this happens (the write outside) it is already a fatal error, can cause crash. If no buffer overflow happens the terminating zero is copied from `src`, so why would `dst` be not null terminated?

Charusso marked 4 inline comments as done.Mar 24 2020, 11:45 AM

Charusso added inline comments.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
930	You let the buffer overflow by a bugprone function call, then you adjust the null-terminator by simply `buf[size] = '\0'`, then you made sure the buffer cannot overflow, since you have terminated it. It is a very common idiom therefore I do not think a hand-written null-termination is a security issue. The SEI CERT rules are all theoretical so you will not find anything useful in practice. My solution is practical. This text may be hard to understand. Please note that this text only made for Static Analyzer developers. Let us rephrase it then: Whether the checker needs to warn on the bugprone function calls immediately or look for bugprone hand-written null-termination of bugprone function call made strings. It is a common idiom to null-terminate the string by hand after the insecure function call produce the string which could be misused so that it is on by default. It is useful to turn it off to reduce the noise of the checker, because people usually null-terminate the string by hand immediately after the bugprone function call produce the string. Do you buy that?
clang/lib/StaticAnalyzer/Checkers/AllocationState.h
28	Totally, I have already forgotten I wanted to remove it. Thanks!
clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗	(On Diff #241752)	I like that error message because that is the truth, that is the issue. If it would crash so the operating system would detect a fatal error we should not develop Static Analyzer and vulnerabilities would not exist. That would be cool, btw.

balazske added inline comments.Mar 25 2020, 1:58 AM

clang/docs/analyzer/checkers.rst
1983	So I can move this checker: D71510 into `alpha.security.cert.err.33c`?
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
930	First sentence is OK, the later part relatively better. Probably the explanations about why it is good to turn it on or off can be left out, these are misunderstandable. (If the manual null-termination is done, in one case it is a "common idiom but can be misused so make warning" on other case "these warnings generates only extra noise".) It can be something like this (if correct): Whether the checker wars on the function calls immediately or warns on bugprone hand-written null-termination of function call made strings. Default value is on because null-terminating the string by hand after the insecure function call could be misused. But it is a common idiom to manually null-terminate the strings immediately after the function calls so it could be turned off to reduce noise of the checker. I do not like the word "bugprone", is it correct english? ("Problematic" or "insecure" could be used instead.)
clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗	(On Diff #241752)	There must be some misunderstanding here, I still do not know what this test does. Can probably somebody other explain it? After the first `strcpy` the `src` content is copied into `dest`. If `src` is not a null-terminated string this is itself a bug and `strcpy` is undefined behavior. If `src` has `strlen(src) > 127` it is a bug again because the copy can write outside of `dest` past its end, into the stack frame of the function. After this happens, writing a null terminator does not correct the other overwritten data that still can cause crash or other thing. (This `strcpy` call is the place for a warning. Probably not a fatal error because it is not sure, if size of `src` is not known.) Otherwise result of `strcpy` is a null-terminated string in `dest`. The next `*dest = '\0'` causes that `dest` points to an empty string, why is this needed here (it was already null terminated)? The following `strcpy` overwrites it anyway.

"To prevent such errors, either limit copies through truncation or, preferably, ensure that the destination is of sufficient size to hold the character data" - from the rule's page.
Most of the projects are fine truncating by hand because the write happens in somewhat well-bounded strings: IP-addresses, names, numbers... I wanted to make this as practical as possible. Until you are having a null-terminated string without being read, you are most likely fine. Feel free to try this out, probably you would already understand the WarnOnCall option very well.

clang/docs/analyzer/checkers.rst
1983	Well, not exactly. No one should care about alpha checkers except advanced Static Analyzer developers like you do right now. Since alpha is a dead-zone you can put your work anywhere. The core issue was Ted's placing of CERT rules. I have written the first CERT-checker after Ted's, which introduced such packaging, then it became dead, so I have lost interest to make the package-aliasing a thing. If you wish to move to `cert`, feel free, but please note that, your first idea of bad-packaging is right. We really need the functionality of package-aliasing but I have ran out of time and the project died as well.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
930	`bugprone` comes from the Clang-Tidy world and we use it everywhere, but let us change it to `insecure` then. I will copy-paste it. Thanks!
clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗	(On Diff #241752)	Well, I did not make any comments about what is going on, because it is far away of being finished, it is an alpha checker. Until we have not designed it, it is useless to write documentation, but feel free to read the discussion of the checker's evolution, so that I do not need to repeat myself. Once we have designed it, sure, I will document the design. Let us first clear your misconceptions: After the first strcpy the src content is copied into dest. If src is not a null-terminated string this is itself a bug and strcpy is undefined behavior. Cool, but this is another rule namely STR32-C: D71033. Here we do not care about that issue. Each test should be isolated and test the behavior what we expect. We do not care about another patch/rule here. If src has strlen(src) > 127 it is a bug again because the copy can write outside of dest past its end, into the stack frame of the function. It is a bug in a security point of view. However, in the wild, no one really cares about it, and null-terminate the string by hand. That is why this is the test file when `WarnOnCall=false` to reduce the noise of the checker if someone is fine with such concept, but still want to catch non-null-terminated strings. So that this checker servers the needs of "enough space for holding the string" and "null-terminated strings" given by the assumption `src` is already null-terminated or `dest` is null-terminated by hand before the string is possibly read. Most of the C projects have a constant global array which could hold 1024 characters and they do not care if overflow happens, but still care if null-termination happens so truncate the string by hand. After this happens, writing a null terminator does not correct the other overwritten data that still can cause crash or other thing. Tell to the C open source project (like VIM) writers they are dumb. I still want to make sure they could benefit from this checker in a non-security point of view. (This strcpy call is the place for a warning. Probably not a fatal error because it is not sure, if size of src is not known.) Nope, D71033. Otherwise result of strcpy is a null-terminated string in dest. Nope, in a non-security point of view there is no issue, because the normal behavior of the program to read an IP-address or something bounded and just a truncation needed for the enter/space characters to the end. This is a normal C-programmer idiom and on projects where no-one cares about security, like VIM. The next dest = '\0' causes that dest points to an empty string, Yes. why is this needed here (it was already null terminated)? The following strcpy overwrites it anyway. `dest = '\0'` demonstrates the usual C-programmer behavior. They somehow null-terminate the string then everything is alright, no buffer-overflow read could happen. However, on the second time the null-termination may not happen since the `dest` buffer moves out of scope and possibly read by `do_something()`. If it is read in a non-null-terminated way, that is a problem in their world. Probably you get it now why such option introduced. Feel free to try this out. The thing you will see, we cannot point out interesting properties of the bug, such as the destination buffer, so that we cannot give great bug reports, so that the checker remain alpha and useless.

Get rid of the secondary behavior for now.
Fix review comments.

Given that the secondary behavior confuse people I have removed it for now. May if someone introduce a NullTerminationChecker then we introduce such option to warn on insecure calls instant. Thanks @balazske for influencing that change. @NoQ this project had a deadline like half year ago, could you smash that green button please?

balazske added inline comments.Apr 1 2020, 1:29 AM

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
56	The functions `gets`, `strcpy` return a null-terminated string according to standard, probably `sprintf` too, and the `fscanf` `%s` output is null-terminated too even if no length is specified (maybe it writes outside of the specified buffer but null terminated).
clang/test/Analysis/cert/str31-c.cpp
118	Do we get a warning if the `+1` above is missing?

Simplify tests.
Remove dead code, they are far away to being used.
Add an extra test case.

Thanks for the review, hopefully if I ping @NoQ in every round, it will be green-marked soon.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
56	From where do you observe such information? I have not seen anything `strcpy()`-specific in the standard. My observation clearly says that, if there is not enough space for the data you could even write to overlapping memory, or as people says, you could meet nasal demons. It is called undefined behavior.
clang/test/Analysis/cert/str31-c.cpp
118	Test added.

Remove the last dead comment.

Refactor.

Charusso added a child revision: D71033: [analyzer] CERT STR rule checkers: STR32-C.May 24 2020, 9:26 PM

Charusso removed a child revision: D70805: [analyzer] SValHasDescendant: Determine whether the SVal has an SVal descendant.

baloghadamsoftware added inline comments.Jun 26 2020, 4:56 AM

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
934	Maybe we could have more descriptive names for these checkers and mention the number of the rule in the `HelpText` only.
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
67	I would avoid reusing the prefix `check` for functions that are not inherited from `Checker`. Alternatives could be `handle`, `process` etc. for modeling and `verify`, `validate` etc. for checking for error.

baloghadamsoftware added inline comments.Jun 26 2020, 5:12 AM

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
21	Are there so many things common among checkers for SEI CERT string rules that the best option is to implemenet them in a single huge class/file? Would not it be more appropriate to create a library instead for common functions and then implement each rule in a separate checker class? Huge classes are difficult to understand.

baloghadamsoftware requested changes to this revision.Jun 29 2020, 7:36 AM

baloghadamsoftware added inline comments.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
56	From `cppreference.com`: `strcpy()` -- Copies the null-terminated byte string pointed to by src, including the null terminator, to the character array whose first element is pointed to by dest. `gets()` -- //https://en.cppreference.com/w/c/io/gets// `scanf`, `fscanf`, `sscan` format `%s` -- If width specifier is used, matches up to width or until the first whitespace character, whichever appears first. Always stores a null character in addition to the characters matched (so the argument array must have room for at least width+1 characters) Thus @balazske is completely right. All these functions add the null-terminator. Of course, if the string does not fit, it overwrites the memory after the buffer which is undefined behavior. But the null-terminator is definitely added.
101	Do we really need a separate utility function for that?
137	Create `SmallString`, use a stream to print into it, create another one in `printPretty`, convert it to `std::string` and then copy it again to the original `SmallString`? Why?
166	`sprintf` not non-compliant generally. You should calculate the possible maximal length of the string and compare it to the size of the target region.
181	Checking for `%s` is not enough. `%2s` may also be a problem if the buffer is `1` character long.
clang/test/Analysis/cert/str31-c.cpp
28	Why do we need a different buffer size here? And why `enum`?

This revision now requires changes to proceed.Jun 29 2020, 7:36 AM

Herald added a subscriber: rnkovacs. · View Herald TranscriptJun 29 2020, 7:36 AM

whisperity removed a subscriber: whisperity.Jul 9 2020, 1:01 AM

whisperity added a subscriber: whisperity.

Please do not bypass the previous comments that hadn't reached a conclusion -- littering inlines about miscellaneous stuff at this stage does more harm then good, and derails the discussion.

Its not the first time I took a look, and I made some progress today again, but there is still much to learn, so I'll wait a bit before throwing my two cents in.

Resolve most of the review comments.
We really need to specify the design of future checkers.

Thanks for the reviews!

In D70411#2153562, @Szelethus wrote:

Please do not bypass the previous comments that hadn't reached a conclusion -- littering inlines about miscellaneous stuff at this stage does more harm then good, and derails the discussion.

Ugh, it is a long story. I really wanted to create the best and trendiest checker of all time with Artem so all the comments are made for him at first. We have had the exact opposite sense what we are trying to achieve here, that is why we left the project dead.

Hopefully I get your suggestion right and you want to make sure I have addressed all the comments. Let me begin with a suggestion as well:

If you are interested in the solution, please do not care about the problem, but the solution.

non-literal advice from It's Not About the Shark: How to Solve Unsolvable Problems

Let me rewind / summarize what problems happened in chunks if you would Ctrl+F5 for certain quotes and what we did in chronological order:

You're bringing in a completely brand-new machinery here, could you explain how it works and why do you need it?

I have attached multiple bug reports to the program state (by registering a map) and invalidated all the non-fatal reports in chain on the bug-path to a given fatal-report to support two functionality of the checker.

That two functionality was a complete stupidity it turned out, I have left only the necessary one.

I think it would really help if you draw a state machine for the checker

Because I have removed the complexity of the checker now it is easy to understand and it does not require an appendix.

But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

Artem refuse the usefulness of the CERT rules from the static analysis perspective, which I agree with.

"if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn"

Artem suggests to develop taint analysis, which I do not want to develop, but his concerns are real.

Traditionally checkers either warn immediately when they can detect an error, or assume that the error has not happened.

That was the near the point when I have realized we need to warn on the function calls immediately. The buffer overflow is immediately a security issue, but I was dumb to realize.

I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways.

Artem believes the arrays are cool because they cannot overflow. For example we obtain an IP address, which has enough space as char[64] our mental model suggests. I believe any overflow is not cool in a security point of view: https://twitter.com/TwitterSupport/status/1283526400146837511

Now I have accepted Artem's suggestion and I do not care about arrays because the Analyzer cannot model the constraints of the array size yet / taintness. If someone working with a plain char array that could be easy to detect and measure. For the security problems we want to catch allocations.

The known size does not mean that the string is going to be null-terminated.
The problem is not the size, but the missing '\0' (which you can have multiple of at any point).

I still wanted to care about null-termination, however the entire checker should be about overflow.

"The checker warns every time it finds an execution path on which unintended behavior occurs"

"The checker enforces a certain coding guideline on the user": (say, "don't pass the string by pointer anywhere before you null-terminate it")

If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

Artem tried to create a direction for the checker, it was pretty complex. I have refused again, because all the people null terminate by hand.

This text may be hard to understand.

Balazs get confused by the second behavior, like Artem did for a while. I have tried to rephrase the documentation. With Balazs' help we made the documentation cool (Thanks again!).

Get rid of the secondary behavior for now.

That was the time when I have learnt complexity is a huge issue and people will get confused. I have removed the entire secondary behavior of null termination checking and moved some logic to STR32-C.

The conclusion: Now this checker is very strict and hopefully enough simple. I hope we could design the future of checker-writing.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
934	The naming is designed with Aaron and Artem, so that it is consistent with the Clang-Tidy naming. We really need to support aliasing names here.
clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
21	things common among checkers for SEI CERT string rules Well, yes. best option is to implemenet them in a single huge file Sadly that is our standard now and there is no API to make it well-engineered by default. This is my first real checker and I could not really be the pioneer of creating the new standard of checker writing as I am not that experienced on my own. I believe if all the Ericsson people would gather and brainstorm for a long period of time and create the initial better way of doing checkers as a design document we could improve such document as the whole community. If some extent of consensus met by most of the developers, we could start to create a better environment, which already would create the design how should we split up the files. I do not see the immediate gains of splitting up stuff as long as we do not create such design to make sure we do not need to split up or move code anymore. I wanted to follow the latest trends with that checker and eventually come up with the design how to write checkers. After that I have realized it is way too much work for an individual and it is a very long run where such change is necessary: in function summaries (http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html)
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
56	Well, let assume that we rely on the cppreference page from now on as our best approximation of the required behavior. Thanks for the notes. `strcpy()`: Copies the null-terminated byte string pointed to by src, including the null terminator All these functions add the null-terminator. Copies does not mean it adds the null-terminator to end of the `dest` explicitly. It is possible the `src` is already not null-terminated so there is no way to copy the `'\0'`, and so that the STR32-C rule made: https://wiki.sei.cmu.edu/confluence/display/c/STR32-C.+Do+not+pass+a+non-null-terminated+character+sequence+to+a+library+function+that+expects+a+string I had not checked the other functions in the standard. `gets()` and `fscanf()` is going to be null-terminated according to cppreference and the IEEE standard, cool. I will release my fix on that topic in D71033.
67	The core checkers which I have seen use the prefix `check` which I became in love with. I have not split the modeling/checking logic up enough well so there is no need to use prefix `verify/validate` for now, but I prefer to use the prefix `eval` for the same reason.
101	Yes, it is handy.
137	I like the style of the stream chained together and the easy usage of `printPretty()`. Basically that is why.
166	I really wanted to check for a single `%s`, but somehow I have removed it, facepalm. Thanks!
181	Yes, there are other problems to check. I wanted to catch the given examples at first to make sure the patch is easy to understand. Thanks for the idea, we definitely need to use more of the dynamic size support. I have added this extra feature for both `fscanf()` and `sprintf()`.
clang/test/Analysis/cert/str31-c.cpp
28	I have copied the examples of the rule STR31-C as-is. I like the variety of buffer sizes.

Szelethus mentioned this in D72705: [analyzer] Added new checker 'alpha.unix.ErrorReturn'..Sep 29 2020, 9:59 AM

Revision Contents

Path

Size

clang/

docs/

analyzer/

checkers.rst

12 lines

	include/	clang/	AST/
		lib/	AST/

FormatStringParsing.h

19 lines

include/

clang/

StaticAnalyzer/

Checkers/

Checkers.td

15 lines

lib/

AST/

FormatString.cpp

2 lines

FormatStringParsing.h

PrintfFormatString.cpp

20 lines

ScanfFormatString.cpp

17 lines

StaticAnalyzer/

Checkers/

AllocationState.h

2 lines

CMakeLists.txt

1 line

MallocChecker.cpp

4 lines

cert/

StrChecker.cpp

405 lines

test/

Analysis/

Inputs/

system-header-simulator.h

6 lines

cert/

str31-c-extras.cpp

95 lines

str31-c-notes.cpp

47 lines

str31-c.cpp

184 lines

Diff 278368

clang/docs/analyzer/checkers.rst

Show First 20 Lines • Show All 1,974 Lines • ▼ Show 20 Lines	[NSString stringWithFormat:
reminderCount];		reminderCount];
}		}

alpha.security		alpha.security
^^^^^^^^^^^^^^		^^^^^^^^^^^^^^


alpha.security.cert		alpha.security.cert
^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^
		balazskeUnsubmitted Done Reply Inline Actions There are already more checkers that can check for CERT related problems but not specially made for these. These checkers do not reside in this new `cert` group. And generally a checker does not check for specifically a CERT rule, instead for more of them or other things too, or more checkers can detect a single rule. (And the user can think that only these CERT rules are checkable that exist in this package, that is not true.) So I do not like the introduction of this new `cert` package. (The documentation of existing checkers lists if the checker is designed for a CERT rule.) balazske: There are already more checkers that can check for CERT related problems but not specially made…
		SzelethusUnsubmitted Done Reply Inline Actions I disagree to some extent. I think it would be great to have a `cert` package that houses all checkers for each of the rules with the addition of checker aliases. Clang-tidy has something similar as well! Szelethus: I disagree to some extent. I think it would be great to have a `cert` package that houses all…
		CharussoAuthorUnsubmitted Done Reply Inline Actions I designed the checker only for the rule STR31-C that is why I have picked package `cert`. Clang-Tidy could aliasing checks. For example the check could be in the `bugprone` category aliased to `cert` and both could trigger the analysis. the user can think that only these CERT rules are checkable We only need to move `security.FloatLoopCounter` under the `cert` package. What else SEI CERT rules are finished off other than package `cert`? It is not my fault if someone could not package well or could not catch most of the issues of a SEI CERT rule or could not reach the SEI CERT group to note the fact the Analyzer catch a very tiny part of a rule. However, this patch package well and could catch most of the STR31-C rule. Charusso: I designed the checker only for the rule STR31-C that is why I have picked package `cert`.
		balazskeUnsubmitted Done Reply Inline Actions So I can move this checker: D71510 into `alpha.security.cert.err.33c`? balazske: So I can move this checker: D71510 into `alpha.security.cert.err.33c`?
		CharussoAuthorUnsubmitted Done Reply Inline Actions Well, not exactly. No one should care about alpha checkers except advanced Static Analyzer developers like you do right now. Since alpha is a dead-zone you can put your work anywhere. The core issue was Ted's placing of CERT rules. I have written the first CERT-checker after Ted's, which introduced such packaging, then it became dead, so I have lost interest to make the package-aliasing a thing. If you wish to move to `cert`, feel free, but please note that, your first idea of bad-packaging is right. We really need the functionality of package-aliasing but I have ran out of time and the project died as well. Charusso: Well, not exactly. No one should care about alpha checkers except advanced Static Analyzer…

SEI CERT checkers which tries to find errors based on their `C coding rules <https://wiki.sei.cmu.edu/confluence/display/c/2+Rules>`_.		SEI CERT checkers which tries to find errors based on their `C coding rules <https://wiki.sei.cmu.edu/confluence/display/c/2+Rules>`_.

.. _alpha-security-cert-pos-checkers:		.. _alpha-security-cert-pos-checkers:

alpha.security.cert.pos		alpha.security.cert.pos
^^^^^^^^^^^^^^^^^^^^^^^		^^^^^^^^^^^^^^^^^^^^^^^

Show All 11 Lines	int func(const char *var) {
char env[1024];		char env[1024];
int retval = snprintf(env, sizeof(env),"TEST=%s", var);		int retval = snprintf(env, sizeof(env),"TEST=%s", var);
if (retval < 0 \|\| (size_t)retval >= sizeof(env)) {		if (retval < 0 \|\| (size_t)retval >= sizeof(env)) {
/* Handle error */		/* Handle error */
}		}

return putenv(env); // putenv function should not be called with auto variables		return putenv(env); // putenv function should not be called with auto variables
}		}

		.. _alpha-security-cert-str-31c:

		alpha.security.cert.str.31c
		"""""""""""""""""""""""""""

		Checker for `STR31-C rule<https://wiki.sei.cmu.edu/confluence/x/sNUxBQ>`_.

		It warns on misusing the following functions:
		``strcpy()``, ``gets()``, ``fscanf()``, ``sprintf()``.

.. _alpha-security-ArrayBound:		.. _alpha-security-ArrayBound:

alpha.security.ArrayBound (C)		alpha.security.ArrayBound (C)
"""""""""""""""""""""""""""""		"""""""""""""""""""""""""""""
Warn about buffer overflows (older checker).		Warn about buffer overflows (older checker).

.. code-block:: c		.. code-block:: c

▲ Show 20 Lines • Show All 549 Lines • Show Last 20 Lines

clang/include/clang/AST/FormatStringParsing.h

This file was moved from clang/lib/AST/FormatStringParsing.h.

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	public:
bool hasValue() const { return Start != nullptr; }		bool hasValue() const { return Start != nullptr; }
const T &getValue() const {		const T &getValue() const {
assert(hasValue());		assert(hasValue());
return FS;		return FS;
}		}
const T &getValue() { return FS; }		const T &getValue() { return FS; }
};		};

		typedef SpecifierResult<analyze_printf::PrintfSpecifier> PrintfSpecifierResult;

		PrintfSpecifierResult ParsePrintfSpecifier(FormatStringHandler &H,
		const char &Beg, const char E,
		unsigned &argIndex,
		const LangOptions &LO,
		const TargetInfo &Target, bool Warn,
		bool isFreeBSDKPrintf);

		typedef SpecifierResult<analyze_scanf::ScanfSpecifier> ScanfSpecifierResult;

		// FIXME: Much of this is copy-paste from ParsePrintfSpecifier.
		// We can possibly refactor.
		ScanfSpecifierResult ParseScanfSpecifier(FormatStringHandler &H,
		const char &Beg, const char E,
		unsigned &argIndex,
		const LangOptions &LO,
		const TargetInfo &Target);

} // end analyze_format_string namespace		} // end analyze_format_string namespace
} // end clang namespace		} // end clang namespace

#endif		#endif

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td

	Show First 20 Lines • Show All 67 Lines • ▼ Show 20 Lines

	def Security : Package <"security">;			def Security : Package <"security">;
	def InsecureAPI : Package<"insecureAPI">, ParentPackage<Security>;			def InsecureAPI : Package<"insecureAPI">, ParentPackage<Security>;
	def SecurityAlpha : Package<"security">, ParentPackage<Alpha>;			def SecurityAlpha : Package<"security">, ParentPackage<Alpha>;
	def Taint : Package<"taint">, ParentPackage<SecurityAlpha>;			def Taint : Package<"taint">, ParentPackage<SecurityAlpha>;

	def CERT : Package<"cert">, ParentPackage<SecurityAlpha>;			def CERT : Package<"cert">, ParentPackage<SecurityAlpha>;
	def POS : Package<"pos">, ParentPackage<CERT>;			def POS : Package<"pos">, ParentPackage<CERT>;
				def STR : Package<"str">, ParentPackage<CERT>;

	def Unix : Package<"unix">;			def Unix : Package<"unix">;
	def UnixAlpha : Package<"unix">, ParentPackage<Alpha>;			def UnixAlpha : Package<"unix">, ParentPackage<Alpha>;
	def CString : Package<"cstring">, ParentPackage<Unix>;			def CString : Package<"cstring">, ParentPackage<Unix>;
	def CStringAlpha : Package<"cstring">, ParentPackage<UnixAlpha>;			def CStringAlpha : Package<"cstring">, ParentPackage<UnixAlpha>;

	def OSX : Package<"osx">;			def OSX : Package<"osx">;
	def OSXAlpha : Package<"osx">, ParentPackage<Alpha>;			def OSXAlpha : Package<"osx">, ParentPackage<Alpha>;
	▲ Show 20 Lines • Show All 832 Lines • ▼ Show 20 Lines

	let ParentPackage = POS in {			let ParentPackage = POS in {

	def PutenvWithAuto : Checker<"34c">,			def PutenvWithAuto : Checker<"34c">,
	HelpText<"Finds calls to the 'putenv' function which pass a pointer to "			HelpText<"Finds calls to the 'putenv' function which pass a pointer to "
	"an automatic variable as the argument.">,			"an automatic variable as the argument.">,
	Documentation<HasDocumentation>;			Documentation<HasDocumentation>;

	} // end "alpha.cert.pos"			} // end "alpha.cert.pos"
				NoQUnsubmitted Done Reply Inline Actions Maybe `31c`? I'm afraid a dot in package name would be confusing and people will actually try something like to enable/disable `cert.str.31`. NoQ: Maybe `31c`? I'm afraid a dot in package name would be confusing and people will actually try…

				let ParentPackage = STR in {

				CharussoAuthorUnsubmitted Done Reply Inline Actions In my mind the documentation is the CERT rule's page. Is it okay? Charusso: In my mind the documentation is the CERT rule's page. Is it okay?
				NoQUnsubmitted Done Reply Inline Actions I'd prefer our own documentation that may link to the CERT page but should also explain, say, which parts of the rule are covered by the checker (even if it's "all of them", they may add more rules in the future), and probably some specifics of the checker's behavior if they aren't obvious from the rule. NoQ: I'd prefer our own documentation that may link to the CERT page but should also explain, say…
				def StrCheckerBase : Checker<"StrCheckerBase">,
				HelpText<"SEI CERT base checker of rules defined in STR">,
				NoQUnsubmitted Done Reply Inline Actions `alpha.cert.str`. NoQ: `alpha.cert.str`.
				balazskeUnsubmitted Done Reply Inline Actions This text may be hard to understand. The option means "if it is off, the problem report happens only if there is no hand-written null termination of the string"? I looked at the CERT rule but did not find out why is null termination by hand important here. (I did not look at the code to find out what it does.) And "WarnOnCall" suggests that there is warning made on calls if on, or not at all or at other location if off, is this correct? balazske: This text may be hard to understand. The option means "if it is off, the problem report happens…
				CharussoAuthorUnsubmitted Done Reply Inline Actions You let the buffer overflow by a bugprone function call, then you adjust the null-terminator by simply `buf[size] = '\0'`, then you made sure the buffer cannot overflow, since you have terminated it. It is a very common idiom therefore I do not think a hand-written null-termination is a security issue. The SEI CERT rules are all theoretical so you will not find anything useful in practice. My solution is practical. This text may be hard to understand. Please note that this text only made for Static Analyzer developers. Let us rephrase it then: Whether the checker needs to warn on the bugprone function calls immediately or look for bugprone hand-written null-termination of bugprone function call made strings. It is a common idiom to null-terminate the string by hand after the insecure function call produce the string which could be misused so that it is on by default. It is useful to turn it off to reduce the noise of the checker, because people usually null-terminate the string by hand immediately after the bugprone function call produce the string. Do you buy that? Charusso: You let the buffer overflow by a bugprone function call, then you adjust the null-terminator by…
				balazskeUnsubmitted Done Reply Inline Actions First sentence is OK, the later part relatively better. Probably the explanations about why it is good to turn it on or off can be left out, these are misunderstandable. (If the manual null-termination is done, in one case it is a "common idiom but can be misused so make warning" on other case "these warnings generates only extra noise".) It can be something like this (if correct): Whether the checker wars on the function calls immediately or warns on bugprone hand-written null-termination of function call made strings. Default value is on because null-terminating the string by hand after the insecure function call could be misused. But it is a common idiom to manually null-terminate the strings immediately after the function calls so it could be turned off to reduce noise of the checker. I do not like the word "bugprone", is it correct english? ("Problematic" or "insecure" could be used instead.) balazske: First sentence is OK, the later part relatively better. Probably the explanations about why it…
				CharussoAuthorUnsubmitted Done Reply Inline Actions `bugprone` comes from the Clang-Tidy world and we use it everywhere, but let us change it to `insecure` then. I will copy-paste it. Thanks! Charusso: `bugprone` comes from the Clang-Tidy world and we use it everywhere, but let us change it to…
				Documentation<NotDocumented>,
				Hidden;

				def Str31cChecker : Checker<"31c">,
				baloghadamsoftwareUnsubmitted Done Reply Inline Actions Maybe we could have more descriptive names for these checkers and mention the number of the rule in the `HelpText` only. baloghadamsoftware: Maybe we could have more descriptive names for these checkers and mention the number of the…
				CharussoAuthorUnsubmitted Done Reply Inline Actions The naming is designed with Aaron and Artem, so that it is consistent with the Clang-Tidy naming. We really need to support aliasing names here. Charusso: The naming is designed with Aaron and Artem, so that it is consistent with the Clang-Tidy…
				HelpText<"SEI CERT checker of rules defined in STR31-C">,
				Dependencies<[StrCheckerBase]>,
				Documentation<HasDocumentation>;

				} // end "alpha.cert.str"

	let ParentPackage = SecurityAlpha in {			let ParentPackage = SecurityAlpha in {

	def ArrayBoundChecker : Checker<"ArrayBound">,			def ArrayBoundChecker : Checker<"ArrayBound">,
	HelpText<"Warn about buffer overflows (older checker)">,			HelpText<"Warn about buffer overflows (older checker)">,
	Documentation<HasAlphaDocumentation>;			Documentation<HasAlphaDocumentation>;

	def ArrayBoundCheckerV2 : Checker<"ArrayBoundV2">,			def ArrayBoundCheckerV2 : Checker<"ArrayBoundV2">,
	HelpText<"Warn about buffer overflows (newer checker)">,			HelpText<"Warn about buffer overflows (newer checker)">,
	▲ Show 20 Lines • Show All 720 Lines • Show Last 20 Lines

clang/lib/AST/FormatString.cpp

	// FormatString.cpp - Common stuff for handling printf/scanf formats -- C++ --			// FormatString.cpp - Common stuff for handling printf/scanf formats -- C++ --
	//			//
	// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.			// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
	// See https://llvm.org/LICENSE.txt for license information.			// See https://llvm.org/LICENSE.txt for license information.
	// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception			// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//
	//			//
	// Shared details for processing format strings of printf and scanf			// Shared details for processing format strings of printf and scanf
	// (and friends).			// (and friends).
	//			//
	//===----------------------------------------------------------------------===//			//===----------------------------------------------------------------------===//

	#include "FormatStringParsing.h"			#include "clang/AST/FormatStringParsing.h"
	#include "clang/Basic/LangOptions.h"			#include "clang/Basic/LangOptions.h"
	#include "clang/Basic/TargetInfo.h"			#include "clang/Basic/TargetInfo.h"
	#include "llvm/Support/ConvertUTF.h"			#include "llvm/Support/ConvertUTF.h"

	using clang::analyze_format_string::ArgType;			using clang::analyze_format_string::ArgType;
	using clang::analyze_format_string::FormatStringHandler;			using clang::analyze_format_string::FormatStringHandler;
	using clang::analyze_format_string::FormatSpecifier;			using clang::analyze_format_string::FormatSpecifier;
	using clang::analyze_format_string::LengthModifier;			using clang::analyze_format_string::LengthModifier;
	▲ Show 20 Lines • Show All 988 Lines • Show Last 20 Lines

clang/lib/AST/FormatStringParsing.h

This file was moved to clang/include/clang/AST/FormatStringParsing.h.

clang/lib/AST/PrintfFormatString.cpp

//== PrintfFormatString.cpp - Analysis of printf format strings --- C++ --==//		//== PrintfFormatString.cpp - Analysis of printf format strings --- C++ --==//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Handling of format string in printf and friends. The structure of format		// Handling of format string in printf and friends. The structure of format
// strings for fprintf() are described in C99 7.19.6.1.		// strings for fprintf() are described in C99 7.19.6.1.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "FormatStringParsing.h"
#include "clang/AST/FormatString.h"		#include "clang/AST/FormatString.h"
		#include "clang/AST/FormatStringParsing.h"
#include "clang/AST/OSLog.h"		#include "clang/AST/OSLog.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"
#include "llvm/Support/Regex.h"		#include "llvm/Support/Regex.h"

using clang::analyze_format_string::ArgType;		using clang::analyze_format_string::ArgType;
using clang::analyze_format_string::FormatStringHandler;		using clang::analyze_format_string::FormatStringHandler;
using clang::analyze_format_string::LengthModifier;		using clang::analyze_format_string::LengthModifier;
using clang::analyze_format_string::OptionalAmount;		using clang::analyze_format_string::OptionalAmount;
using clang::analyze_format_string::ConversionSpecifier;		using clang::analyze_format_string::ConversionSpecifier;
using clang::analyze_printf::PrintfSpecifier;		using clang::analyze_printf::PrintfSpecifier;

using namespace clang;		using namespace clang;

typedef clang::analyze_format_string::SpecifierResult<PrintfSpecifier>
PrintfSpecifierResult;

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// Methods for parsing format strings.		// Methods for parsing format strings.
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

using analyze_format_string::ParseNonPositionAmount;		using analyze_format_string::ParseNonPositionAmount;

static bool ParsePrecision(FormatStringHandler &H, PrintfSpecifier &FS,		static bool ParsePrecision(FormatStringHandler &H, PrintfSpecifier &FS,
const char Start, const char &Beg, const char *E,		const char Start, const char &Beg, const char *E,
Show All 23 Lines	if (Warn) {
if (Flag == "")		if (Flag == "")
H.HandleEmptyObjCModifierFlag(FlagBeg, E - FlagBeg);		H.HandleEmptyObjCModifierFlag(FlagBeg, E - FlagBeg);
else		else
H.HandleInvalidObjCModifierFlag(FlagBeg, E - FlagBeg);		H.HandleInvalidObjCModifierFlag(FlagBeg, E - FlagBeg);
}		}
return true;		return true;
}		}

static PrintfSpecifierResult ParsePrintfSpecifier(FormatStringHandler &H,		analyze_format_string::PrintfSpecifierResult
const char *&Beg,		analyze_format_string::ParsePrintfSpecifier(FormatStringHandler &H,
const char *E,		const char &Beg, const char E,
unsigned &argIndex,		unsigned &argIndex,
const LangOptions &LO,		const LangOptions &LO,
const TargetInfo &Target,		const TargetInfo &Target, bool Warn,
bool Warn,
bool isFreeBSDKPrintf) {		bool isFreeBSDKPrintf) {

using namespace clang::analyze_format_string;		using namespace clang::analyze_format_string;
using namespace clang::analyze_printf;		using namespace clang::analyze_printf;

const char *I = Beg;		const char *I = Beg;
const char *Start = nullptr;		const char *Start = nullptr;
UpdateOnReturn <const char*> UpdateBeg(Beg, I);		UpdateOnReturn <const char*> UpdateBeg(Beg, I);

▲ Show 20 Lines • Show All 1,034 Lines • Show Last 20 Lines

clang/lib/AST/ScanfFormatString.cpp

//= ScanfFormatString.cpp - Analysis of printf format strings --- C++ --===//		//= ScanfFormatString.cpp - Analysis of printf format strings --- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// Handling of format string in scanf and friends. The structure of format		// Handling of format string in scanf and friends. The structure of format
// strings for fscanf() are described in C99 7.19.6.2.		// strings for fscanf() are described in C99 7.19.6.2.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#include "clang/AST/FormatString.h"		#include "clang/AST/FormatString.h"
#include "FormatStringParsing.h"		#include "clang/AST/FormatStringParsing.h"
#include "clang/Basic/TargetInfo.h"		#include "clang/Basic/TargetInfo.h"

using clang::analyze_format_string::ArgType;		using clang::analyze_format_string::ArgType;
using clang::analyze_format_string::FormatStringHandler;		using clang::analyze_format_string::FormatStringHandler;
using clang::analyze_format_string::LengthModifier;		using clang::analyze_format_string::LengthModifier;
using clang::analyze_format_string::OptionalAmount;		using clang::analyze_format_string::OptionalAmount;
using clang::analyze_format_string::ConversionSpecifier;		using clang::analyze_format_string::ConversionSpecifier;
using clang::analyze_scanf::ScanfConversionSpecifier;		using clang::analyze_scanf::ScanfConversionSpecifier;
using clang::analyze_scanf::ScanfSpecifier;		using clang::analyze_scanf::ScanfSpecifier;
using clang::UpdateOnReturn;		using clang::UpdateOnReturn;
using namespace clang;		using namespace clang;

typedef clang::analyze_format_string::SpecifierResult<ScanfSpecifier>
ScanfSpecifierResult;

static bool ParseScanList(FormatStringHandler &H,		static bool ParseScanList(FormatStringHandler &H,
ScanfConversionSpecifier &CS,		ScanfConversionSpecifier &CS,
const char &Beg, const char E) {		const char &Beg, const char E) {
const char *I = Beg;		const char *I = Beg;
const char *start = I - 1;		const char *start = I - 1;
UpdateOnReturn <const char*> UpdateBeg(Beg, I);		UpdateOnReturn <const char*> UpdateBeg(Beg, I);

// No more characters?		// No more characters?
Show All 28 Lines	static bool ParseScanList(FormatStringHandler &H,
}		}

CS.setEndScanList(I);		CS.setEndScanList(I);
return false;		return false;
}		}

// FIXME: Much of this is copy-paste from ParsePrintfSpecifier.		// FIXME: Much of this is copy-paste from ParsePrintfSpecifier.
// We can possibly refactor.		// We can possibly refactor.
static ScanfSpecifierResult ParseScanfSpecifier(FormatStringHandler &H,		analyze_format_string::ScanfSpecifierResult
const char *&Beg,		analyze_format_string::ParseScanfSpecifier(FormatStringHandler &H,
const char *E,		const char &Beg, const char E,
unsigned &argIndex,		unsigned &argIndex,
const LangOptions &LO,		const LangOptions &LO,
const TargetInfo &Target) {		const TargetInfo &Target) {
using namespace clang::analyze_format_string;		using namespace clang::analyze_format_string;
using namespace clang::analyze_scanf;		using namespace clang::analyze_scanf;
const char *I = Beg;		const char *I = Beg;
const char *Start = nullptr;		const char *Start = nullptr;
UpdateOnReturn <const char*> UpdateBeg(Beg, I);		UpdateOnReturn <const char*> UpdateBeg(Beg, I);

// Look for a '%' character that indicates the start of a format specifier.		// Look for a '%' character that indicates the start of a format specifier.
for ( ; I != E ; ++I) {		for ( ; I != E ; ++I) {
▲ Show 20 Lines • Show All 479 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Checkers/AllocationState.h

	Show All 19 Lines
	ProgramStateRef markReleased(ProgramStateRef State, SymbolRef Sym,			ProgramStateRef markReleased(ProgramStateRef State, SymbolRef Sym,
	const Expr *Origin);			const Expr *Origin);

	/// This function provides an additional visitor that augments the bug report			/// This function provides an additional visitor that augments the bug report
	/// with information relevant to memory errors caused by the misuse of			/// with information relevant to memory errors caused by the misuse of
	/// AF_InnerBuffer symbols.			/// AF_InnerBuffer symbols.
	std::unique_ptr<BugReporterVisitor> getInnerPointerBRVisitor(SymbolRef Sym);			std::unique_ptr<BugReporterVisitor> getInnerPointerBRVisitor(SymbolRef Sym);

				std::unique_ptr<BugReporterVisitor> getMallocBRVisitor(SymbolRef Sym);
				balazskeUnsubmitted Done Reply Inline Actions This documentation could be improved or left out. balazske: This documentation could be improved or left out.
				CharussoAuthorUnsubmitted Done Reply Inline Actions Totally, I have already forgotten I wanted to remove it. Thanks! Charusso: Totally, I have already forgotten I wanted to remove it. Thanks!

	/// 'Sym' represents a pointer to the inner buffer of a container object.			/// 'Sym' represents a pointer to the inner buffer of a container object.
	/// This function looks up the memory region of that object in			/// This function looks up the memory region of that object in
	/// DanglingInternalBufferChecker's program state map.			/// DanglingInternalBufferChecker's program state map.
	const MemRegion *getContainerObjRegion(ProgramStateRef State, SymbolRef Sym);			const MemRegion *getContainerObjRegion(ProgramStateRef State, SymbolRef Sym);

	} // end namespace allocation_state			} // end namespace allocation_state

	} // end namespace ento			} // end namespace ento
	} // end namespace clang			} // end namespace clang

	#endif			#endif

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt

Show All 12 Lines	add_clang_library(clangStaticAnalyzerCheckers
BoolAssignmentChecker.cpp		BoolAssignmentChecker.cpp
BuiltinFunctionChecker.cpp		BuiltinFunctionChecker.cpp
CStringChecker.cpp		CStringChecker.cpp
CStringSyntaxChecker.cpp		CStringSyntaxChecker.cpp
CallAndMessageChecker.cpp		CallAndMessageChecker.cpp
CastSizeChecker.cpp		CastSizeChecker.cpp
CastToStructChecker.cpp		CastToStructChecker.cpp
CastValueChecker.cpp		CastValueChecker.cpp
		cert/StrChecker.cpp
		baloghadamsoftwareUnsubmitted Done Reply Inline Actions Are there so many things common among checkers for SEI CERT string rules that the best option is to implemenet them in a single huge class/file? Would not it be more appropriate to create a library instead for common functions and then implement each rule in a separate checker class? Huge classes are difficult to understand. baloghadamsoftware: Are there so many things common among checkers for SEI CERT string rules that the best option…
		CharussoAuthorUnsubmitted Done Reply Inline Actions things common among checkers for SEI CERT string rules Well, yes. best option is to implemenet them in a single huge file Sadly that is our standard now and there is no API to make it well-engineered by default. This is my first real checker and I could not really be the pioneer of creating the new standard of checker writing as I am not that experienced on my own. I believe if all the Ericsson people would gather and brainstorm for a long period of time and create the initial better way of doing checkers as a design document we could improve such document as the whole community. If some extent of consensus met by most of the developers, we could start to create a better environment, which already would create the design how should we split up the files. I do not see the immediate gains of splitting up stuff as long as we do not create such design to make sure we do not need to split up or move code anymore. I wanted to follow the latest trends with that checker and eventually come up with the design how to write checkers. After that I have realized it is way too much work for an individual and it is a very long run where such change is necessary: in function summaries (http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html) Charusso: > things common among checkers for SEI CERT string rules Well, yes. > best option is to…
CheckObjCDealloc.cpp		CheckObjCDealloc.cpp
CheckObjCInstMethSignature.cpp		CheckObjCInstMethSignature.cpp
CheckPlacementNew.cpp		CheckPlacementNew.cpp
CheckSecuritySyntaxOnly.cpp		CheckSecuritySyntaxOnly.cpp
CheckSizeofPointer.cpp		CheckSizeofPointer.cpp
CheckerDocumentation.cpp		CheckerDocumentation.cpp
ChrootChecker.cpp		ChrootChecker.cpp
CloneChecker.cpp		CloneChecker.cpp
▲ Show 20 Lines • Show All 112 Lines • Show Last 20 Lines

clang/lib/StaticAnalyzer/Checkers/MallocChecker.cpp

Show First 20 Lines • Show All 3,370 Lines • ▼ Show 20 Lines	if (!RS.isEmpty()) {
}		}
}		}
}		}

namespace clang {		namespace clang {
namespace ento {		namespace ento {
namespace allocation_state {		namespace allocation_state {

		std::unique_ptr<BugReporterVisitor> getMallocBRVisitor(SymbolRef Sym) {
		return std::make_unique<MallocBugVisitor>(Sym);
		}

ProgramStateRef		ProgramStateRef
markReleased(ProgramStateRef State, SymbolRef Sym, const Expr *Origin) {		markReleased(ProgramStateRef State, SymbolRef Sym, const Expr *Origin) {
AllocationFamily Family = AF_InnerBuffer;		AllocationFamily Family = AF_InnerBuffer;
return State->set<RegionState>(Sym, RefState::getReleased(Family, Origin));		return State->set<RegionState>(Sym, RefState::getReleased(Family, Origin));
}		}

} // end namespace allocation_state		} // end namespace allocation_state
} // end namespace ento		} // end namespace ento
Show All 35 Lines

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp

This file was added.

				//===- StrCheckerBase - SEI CERT rules checker defined in STR ---- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// This file defines StrCheckerBase which tries to find the SEI CERT string
				// handler function issues.
				// The rules can be found in section 'Rule 07. Characters and Strings (STR)':
				// - https://wiki.sei.cmu.edu/confluence/x/ptUxBQ
				//
				// This checker is a base checker which consist of the following checkers:
				// - STR31-C: Guarantee that storage for strings has sufficient space for
				// character data and the null terminator:
				// - https://wiki.sei.cmu.edu/confluence/x/sNUxBQ
				//
				//===----------------------------------------------------------------------===//

				#include "AllocationState.h"
				#include "clang/AST/FormatString.h"
				#include "clang/AST/FormatStringParsing.h"
				#include "clang/Basic/TargetInfo.h"
				#include "clang/Lex/Lexer.h"
				#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
				#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
				#include "clang/StaticAnalyzer/Core/Checker.h"
				#include "clang/StaticAnalyzer/Core/CheckerManager.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
				#include "clang/StaticAnalyzer/Core/PathSensitive/DynamicSize.h"
				#include <utility>

				using namespace clang;
				using namespace ento;
				using namespace analyze_printf;
				using namespace analyze_scanf;
				using namespace analyze_format_string;

				namespace {

				/// Helper struct for defining the function call signatures of 'CDM'.
				/// The fields are defining what is the index of the interesting arguments.
				struct CallContext {
				CallContext(Optional<unsigned> DestinationPos,
				Optional<unsigned> SourcePos = None)
				: DestinationPos(DestinationPos), SourcePos(SourcePos) {}

				Optional<unsigned> DestinationPos;
				Optional<unsigned> SourcePos;
				};

				class StrCheckerBase : public Checker<check::PostCall> {
				public:
				/// Check the function calls defined in 'CDM'.
				balazskeUnsubmitted Done Reply Inline Actions The functions `gets`, `strcpy` return a null-terminated string according to standard, probably `sprintf` too, and the `fscanf` `%s` output is null-terminated too even if no length is specified (maybe it writes outside of the specified buffer but null terminated). balazske: The functions `gets`, `strcpy` return a null-terminated string according to standard, probably…
				CharussoAuthorUnsubmitted Done Reply Inline Actions From where do you observe such information? I have not seen anything `strcpy()`-specific in the standard. My observation clearly says that, if there is not enough space for the data you could even write to overlapping memory, or as people says, you could meet nasal demons. It is called undefined behavior. Charusso: From where do you observe such information? I have not seen anything `strcpy()`-specific in the…
				baloghadamsoftwareUnsubmitted Done Reply Inline Actions From `cppreference.com`: `strcpy()` -- Copies the null-terminated byte string pointed to by src, including the null terminator, to the character array whose first element is pointed to by dest. `gets()` -- //https://en.cppreference.com/w/c/io/gets// `scanf`, `fscanf`, `sscan` format `%s` -- If width specifier is used, matches up to width or until the first whitespace character, whichever appears first. Always stores a null character in addition to the characters matched (so the argument array must have room for at least width+1 characters) Thus @balazske is completely right. All these functions add the null-terminator. Of course, if the string does not fit, it overwrites the memory after the buffer which is undefined behavior. But the null-terminator is definitely added. baloghadamsoftware: From `cppreference.com`: `strcpy()` -- //Copies the null-terminated byte string pointed to by…
				CharussoAuthorUnsubmitted Done Reply Inline Actions Well, let assume that we rely on the cppreference page from now on as our best approximation of the required behavior. Thanks for the notes. `strcpy()`: Copies the null-terminated byte string pointed to by src, including the null terminator All these functions add the null-terminator. Copies does not mean it adds the null-terminator to end of the `dest` explicitly. It is possible the `src` is already not null-terminated so there is no way to copy the `'\0'`, and so that the STR32-C rule made: https://wiki.sei.cmu.edu/confluence/display/c/STR32-C.+Do+not+pass+a+non-null-terminated+character+sequence+to+a+library+function+that+expects+a+string I had not checked the other functions in the standard. `gets()` and `fscanf()` is going to be null-terminated according to cppreference and the IEEE standard, cool. I will release my fix on that topic in D71033. Charusso: Well, let assume that we rely on the cppreference page from now on as our best approximation of…
				///
				/// - In case of STR31-C we want to emit an error report immediately on
				/// possible buffer overflows caused by function calls defined in 'CDM' under
				/// STR31-C.
				void checkPostCall(const CallEvent &Call, CheckerContext &C) const;

				/// \{
				/// Check the function calls defined in 'CDM'.
				void checkGets(const CallEvent &Call, const CallContext &CallC,
				CheckerContext &C) const;
				void checkSprintf(const CallEvent &Call, const CallContext &CallC,
				baloghadamsoftwareUnsubmitted Done Reply Inline Actions I would avoid reusing the prefix `check` for functions that are not inherited from `Checker`. Alternatives could be `handle`, `process` etc. for modeling and `verify`, `validate` etc. for checking for error. baloghadamsoftware: I would avoid reusing the prefix `check` for functions that are not inherited from `Checker`.
				CharussoAuthorUnsubmitted Done Reply Inline Actions The core checkers which I have seen use the prefix `check` which I became in love with. I have not split the modeling/checking logic up enough well so there is no need to use prefix `verify/validate` for now, but I prefer to use the prefix `eval` for the same reason. Charusso: The core checkers which I have seen use the prefix `check` which I became in love with. I have…
				CheckerContext &C) const;
				void checkFscanf(const CallEvent &Call, const CallContext &CallC,
				CheckerContext &C) const;
				void checkStrcpy(const CallEvent &Call, const CallContext &CallC,
				CheckerContext &C) const;
				/// \}

				bool EnableStr31cChecker = false;

				private:
				using StrCheck = std::function<void(const StrCheckerBase *, const CallEvent &,
				const CallContext &, CheckerContext &)>;

				const CallDescriptionMap<std::pair<StrCheck, CallContext>> CDM = {
				// The following checks STR31-C rules.
				// char gets(char dest);
				{{"gets", 1}, {&StrCheckerBase::checkGets, {0}}},
				// int sprintf(char dest, const char format, ... [const char *src]);
				{{"sprintf", None, 2}, {&StrCheckerBase::checkSprintf, {0, 2}}},
				// int fscanf(FILE stream, const char format, ... [char *dest]);
				{{"fscanf", None, 2}, {&StrCheckerBase::checkFscanf, {2}}},
				// char strcpy(char dest, const char *src);
				{{"strcpy", 2}, {&StrCheckerBase::checkStrcpy, {0, 1}}}};

				BugType BT{this, "Insecure string handler function call",
				categories::SecurityError};
				};
				} // namespace

				//===----------------------------------------------------------------------===//
				// Helper functions.
				//===----------------------------------------------------------------------===//

				/// Returns the interesting region of \p V.
				baloghadamsoftwareUnsubmitted Done Reply Inline Actions Do we really need a separate utility function for that? baloghadamsoftware: Do we really need a separate utility function for that?
				CharussoAuthorUnsubmitted Done Reply Inline Actions Yes, it is handy. Charusso: Yes, it is handy.
				static const MemRegion *getRegion(SVal V) {
				const MemRegion *MR = V.getAsRegion();
				if (!MR)
				return nullptr;

				return MR->getBaseRegion();
				}

				/// Print pretty the expression \p E.
				std::string printPretty(const Expr *E, CheckerContext &C) {
				assert(E);
				SmallString<128> Str;
				llvm::raw_svector_ostream Out(Str);
				E->printPretty(Out, /Helper=/nullptr,
				C.getASTContext().getPrintingPolicy());
				return Out.str().str();
				}

				/// It tries to create an error report of possible buffer overflow.
				///
				/// \param DestPosOffset In case of a variadic function there are multiple
				/// destination arguments so that we need to pass the
				/// offset from the first one currently being modeled.
				static void createCallOverflowReport(const CallEvent &Call,
				const CallContext &CallC,
				CheckerContext &C, const BugType &BT,
				unsigned DestPosOffset = 0) {
				unsigned DestPos = *CallC.DestinationPos + DestPosOffset;
				SVal DestV = Call.getArgSVal(DestPos);
				// FIXME: Handle offsets.
				const MemRegion *MR = getRegion(DestV);
				if (!MR)
				return;

				const ExplodedNode *ErrorNode = C.generateErrorNode();
				if (!ErrorNode)
				baloghadamsoftwareUnsubmitted Done Reply Inline Actions Create `SmallString`, use a stream to print into it, create another one in `printPretty`, convert it to `std::string` and then copy it again to the original `SmallString`? Why? baloghadamsoftware: Create `SmallString`, use a stream to print into it, create another one in `printPretty`…
				CharussoAuthorUnsubmitted Done Reply Inline Actions I like the style of the stream chained together and the easy usage of `printPretty()`. Basically that is why. Charusso: I like the style of the stream chained together and the easy usage of `printPretty()`.
				return;

				ProgramStateRef State = C.getState();
				StringRef CallName = Call.getCalleeIdentifier()->getName();
				const Expr *Arg = Call.getArgExpr(DestPos);

				SmallString<256> Msg;
				llvm::raw_svector_ostream Out(Msg);
				Out << '\'' << CallName << "' could write outside of '" << printPretty(Arg, C)
				<< '\'';

				auto Report =
				std::make_unique<PathSensitiveBugReport>(BT, Out.str(), ErrorNode);
				Report->addRange(Arg->getSourceRange());

				// Track the allocation.
				bugreporter::trackExpressionValue(ErrorNode, Arg, *Report);
				if (const SymbolRef Sym = DestV.getAsSymbol())
				Report->addVisitor(allocation_state::getMallocBRVisitor(Sym));

				C.emitReport(std::move(Report));
				}

				//===----------------------------------------------------------------------===//
				// The following checks STR31-C rules.
				//===----------------------------------------------------------------------===//

				void StrCheckerBase::checkGets(const CallEvent &Call, const CallContext &CallC,
				CheckerContext &C) const {
				baloghadamsoftwareUnsubmitted Done Reply Inline Actions `sprintf` not non-compliant generally. You should calculate the possible maximal length of the string and compare it to the size of the target region. baloghadamsoftware: `sprintf` not non-compliant generally. You should calculate the possible maximal length of the…
				CharussoAuthorUnsubmitted Done Reply Inline Actions I really wanted to check for a single `%s`, but somehow I have removed it, facepalm. Thanks! Charusso: I really wanted to check for a single `%s`, but somehow I have removed it, facepalm. Thanks!
				if (EnableStr31cChecker)
				createCallOverflowReport(Call, CallC, C, BT);
				}

				void StrCheckerBase::checkSprintf(const CallEvent &Call,
				CharussoAuthorUnsubmitted Done Reply Inline Actions We can do the opposite to see whether the destination array's type is an array, so that we do not need `getDynamicSizeExpr`, yay. Charusso: We can do the opposite to see whether the destination array's type is an array, so that we do…
				const CallContext &CallC,
				CheckerContext &C) const {
				if (!EnableStr31cChecker)
				return;

				const auto *FormatExpr =
				dyn_cast<StringLiteral>(Call.getArgExpr(1)->IgnoreImpCasts());
				if (!FormatExpr)
				return;

				baloghadamsoftwareUnsubmitted Done Reply Inline Actions Checking for `%s` is not enough. `%2s` may also be a problem if the buffer is `1` character long. baloghadamsoftware: Checking for `%s` is not enough. `%2s` may also be a problem if the buffer is `1` character…
				CharussoAuthorUnsubmitted Done Reply Inline Actions Yes, there are other problems to check. I wanted to catch the given examples at first to make sure the patch is easy to understand. Thanks for the idea, we definitely need to use more of the dynamic size support. I have added this extra feature for both `fscanf()` and `sprintf()`. Charusso: Yes, there are other problems to check. I wanted to catch the given examples at first to make…
				StringRef FormatStr = FormatExpr->getString();
				unsigned Idx = 0;
				const LangOptions &LO = C.getLangOpts();
				const TargetInfo &TI = C.getASTContext().getTargetInfo();
				const char *I = FormatStr.begin();
				const char *E = FormatStr.end();
				FormatStringHandler H;

				ProgramStateRef State = C.getState();
				SValBuilder &SVB = C.getSValBuilder();

				DefinedOrUnknownSVal DestSize = UnknownVal();
				if (const MemRegion MR = getRegion(Call.getArgSVal(CallC.DestinationPos)))
				DestSize = getDynamicSize(State, MR, SVB);

				llvm::SmallVector<StringRef, 16> Specifiers;
				FormatStr.split(Specifiers, "%");

				// Iterate over the format string specifiers and add together the required
				// sizes into 'RequiredSize'.
				SVal RequiredSize = SVB.makeZeroArrayIndex();
				unsigned SrcPosOffset = 0;
				while (I != E) {
				const PrintfSpecifierResult &R = ParsePrintfSpecifier(
				H, I, E, Idx, LO, TI,
				/Warn=/false, TI.getTriple().getOS() == llvm::Triple::FreeBSD);
				if (R.shouldStop())
				break;

				if (!R.hasValue())
				continue;

				// We only need to check string specifiers.
				const PrintfSpecifier &PS = R.getValue();
				if (PS.getConversionSpecifier().getKind() !=
				ConversionSpecifier::Kind::sArg) {
				++SrcPosOffset;
				continue;
				}

				// Try to obtain the maximal size of the current part of the format string
				// or the size of the source argument. Otherwise create an error report.
				// FIXME: Handle offsets.
				const MemRegion *SrcMR =
				getRegion(Call.getArgSVal(*CallC.SourcePos + SrcPosOffset));
				SVal SrcSize = getDynamicSize(State, SrcMR, SVB);

				const OptionalAmount &Size = PS.getPrecision();
				if (Size.getStart()) {
				// For example: "%.124s.txt"
				// ^~~~ SizeLength
				// ^ SizeEnd
				// ^~~~ SuffixLength
				unsigned TmpRequiredSize = Size.getConstantAmount();
				unsigned SizeLength = Size.getConstantLength();
				const char *SizeEnd = Size.getStart() + SizeLength;
				// The first specifier is an empty string due to splitting.
				unsigned SuffixLength = Specifiers[SrcPosOffset + 1].end() - SizeEnd - 1;

				TmpRequiredSize += SuffixLength;
				RequiredSize = SVB.evalBinOp(State, BO_Add, RequiredSize,
				SVB.makeIntVal(TmpRequiredSize,
				/IsUnsigned=/true),
				SVB.getArrayIndexType());
				} else if (SrcSize.getAs<nonloc::ConcreteInt>()) {
				RequiredSize = SVB.evalBinOp(State, BO_Add, RequiredSize, SrcSize,
				SVB.getArrayIndexType());
				} else {
				// There is no precision set and we cannot measure the size of the source.
				createCallOverflowReport(Call, CallC, C, BT);
				return;
				}

				++SrcPosOffset;
				}

				// We need to be able to store the null-terminator at the end so that
				// the enough destination space is RequiredSize < DestSize.
				SVal IsEnoughSpace = SVB.evalBinOp(State, BO_LT, RequiredSize, DestSize,
				SVB.getArrayIndexType());
				if (State->isNull(IsEnoughSpace).isConstrainedTrue())
				createCallOverflowReport(Call, CallC, C, BT);
				}

				void StrCheckerBase::checkFscanf(const CallEvent &Call,
				const CallContext &CallC,
				CheckerContext &C) const {
				if (!EnableStr31cChecker)
				return;

				const auto *FormatExpr =
				dyn_cast<StringLiteral>(Call.getArgExpr(1)->IgnoreImpCasts());
				if (!FormatExpr)
				return;

				StringRef FormatStr = FormatExpr->getString();
				unsigned Idx = 0;
				const LangOptions &LO = C.getLangOpts();
				const TargetInfo &TI = C.getASTContext().getTargetInfo();
				NoQUnsubmitted Done Reply Inline Actions Why is this a false positive? You're bringing in a completely brand-new machinery here, could you explain how it works and why do you need it? NoQ: Why is this a false positive? You're bringing in a completely brand-new machinery here, could…
				CharussoAuthorUnsubmitted Done Reply Inline Actions Hm, yes, I wanted to add comments earlier, sorry. It is still wonky a little-bit, somehow I need to merge the two different errors. Charusso: Hm, yes, I wanted to add comments earlier, sorry. It is still wonky a little-bit, somehow I…
				const char *I = FormatStr.begin();
				const char *E = FormatStr.end();
				FormatStringHandler H;

				ProgramStateRef State = C.getState();
				SValBuilder &SVB = C.getSValBuilder();

				// Iterate over the format string specifiers.
				unsigned DestPosOffset = 0;
				while (I != E) {
				const ScanfSpecifierResult &R = ParseScanfSpecifier(H, I, E, Idx, LO, TI);
				if (R.shouldStop())
				break;

				if (!R.hasValue())
				continue;

				// We only need to check string specifiers.
				const ScanfSpecifier &SS = R.getValue();
				if (SS.getConversionSpecifier().getKind() !=
				ConversionSpecifier::Kind::sArg) {
				++DestPosOffset;
				continue;
				}

				// Try to obtain the maximal size of the current part of the format string.
				const OptionalAmount &Size = SS.getFieldWidth();
				if (Size.getStart()) {
				// FIXME: Handle offsets.
				if (const MemRegion *DestMR = getRegion(
				Call.getArgSVal(*CallC.DestinationPos + DestPosOffset))) {
				SVal RequiredSize = SVB.makeIntVal(Size.getConstantAmount(),
				/IsUnsigned=/true);
				DefinedOrUnknownSVal DestSize = getDynamicSize(State, DestMR, SVB);
				// We need to be able to store the null-terminator at the end so that
				// the enough destination space is RequiredSize < DestSize.
				SVal IsEnoughSpace = SVB.evalBinOp(State, BO_LT, RequiredSize, DestSize,
				SVB.getArrayIndexType());
				if (State->isNull(IsEnoughSpace).isConstrainedTrue()) {
				createCallOverflowReport(Call, CallC, C, BT, DestPosOffset);
				return;
				}
				}
				} else {
				// There is no field width set.
				createCallOverflowReport(Call, CallC, C, BT, DestPosOffset);
				return;
				}

				++DestPosOffset;
				}
				}

				void StrCheckerBase::checkStrcpy(const CallEvent &Call,
				const CallContext &CallC,
				CheckerContext &C) const {
				if (!EnableStr31cChecker)
				return;

				ProgramStateRef State = C.getState();
				SValBuilder &SVB = C.getSValBuilder();
				SVal SrcV = Call.getArgSVal(*CallC.SourcePos);
				SVal DestV = Call.getArgSVal(*CallC.DestinationPos);

				// Check the size of the allocation 'DestSize' to prevent false alarms.
				const MemRegion *SrcMR = getRegion(SrcV);
				const MemRegion *DestMR = getRegion(DestV);
				if (!SrcMR \|\| !DestMR)
				return;

				DefinedOrUnknownSVal SrcSize = getDynamicSize(State, SrcMR, SVB);
				DefinedOrUnknownSVal DestSize = getDynamicSize(State, DestMR, SVB);

				// Only care about allocations we could model.
				// FIXME: Use heuristics what is an allocator and try to obtain its size.
				if (DestSize.isUnknown())
				return;

				if (SrcSize == DestSize)
				return;

				if (const llvm::APSInt *SrcSizeInt = SVB.getKnownValue(State, SrcSize))
				if (const llvm::APSInt *DestSizeInt = SVB.getKnownValue(State, DestSize))
				if (SrcSizeInt->getZExtValue() <= DestSizeInt->getZExtValue())
				return;

				// Allocating size of 'strlen(src)' is most likely fine.
				// FIXME: Make sure we catch 'strlen()' in complex SVals.
				// FIXME: Calcualte the appropriate size for holding the string.
				if (const SymExpr *SE = DestSize.getAsSymExpr())
				if (const auto *SIE = dyn_cast<SymIntExpr>(SE))
				if (const auto *SM = dyn_cast<SymbolMetadata>(SIE->getLHS()))
				if (SM->getRegion() == SrcMR)
				return;

				createCallOverflowReport(Call, CallC, C, BT);
				}

				//===----------------------------------------------------------------------===//
				// Main logic to check a function call.
				//===----------------------------------------------------------------------===//

				void StrCheckerBase::checkPostCall(const CallEvent &Call,
				CheckerContext &C) const {
				if (const auto *Lookup = CDM.lookup(Call)) {
				const StrCheck &Check = Lookup->first;
				Check(this, Call, Lookup->second, C);
				}
				}

				void ento::registerStrCheckerBase(CheckerManager &Mgr) {
				Mgr.registerChecker<StrCheckerBase>();
				}

				bool ento::shouldRegisterStrCheckerBase(const CheckerManager &) { return true; }

				#define REGISTER_CHECKER(Name) \
				void ento::register##Name(CheckerManager &Mgr) { \
				auto *Checker = Mgr.getChecker<StrCheckerBase>(); \
				Checker->Enable##Name = true; \
				} \
				\
				bool ento::shouldRegister##Name(const CheckerManager &) { return true; }

				REGISTER_CHECKER(Str31cChecker)

clang/test/Analysis/Inputs/system-header-simulator.h

	Show All 26 Lines
	extern FILE *__stdoutp;			extern FILE *__stdoutp;
	extern FILE *__stderrp;			extern FILE *__stderrp;

	int scanf(const char *restrict format, ...);			int scanf(const char *restrict format, ...);
	int fscanf(FILE restrict, const char restrict, ...);			int fscanf(FILE restrict, const char restrict, ...);
	int printf(const char *restrict format, ...);			int printf(const char *restrict format, ...);
	int fprintf(FILE restrict, const char restrict, ...);			int fprintf(FILE restrict, const char restrict, ...);
	int getchar(void);			int getchar(void);
				char gets(char buffer);
				char fgets(char str, int n, FILE *stream);
				int sprintf(char buffer, const char format, ...);
				int snprintf(char buffer, size_t count, const char format, ...);
				int fscanf(FILE stream, const char format, ...);
				char getenv(const char varname);

	void setbuf(FILE restrict, char restrict);			void setbuf(FILE restrict, char restrict);
	int setvbuf(FILE restrict, char restrict, int, size_t);			int setvbuf(FILE restrict, char restrict, int, size_t);

	FILE funopen(const void ,			FILE funopen(const void ,
	int ()(void , char *, int),			int ()(void , char *, int),
	int ()(void , const char *, int),			int ()(void , const char *, int),
	fpos_t ()(void , fpos_t, int),			fpos_t ()(void , fpos_t, int),
	▲ Show 20 Lines • Show All 98 Lines • Show Last 20 Lines

clang/test/Analysis/cert/str31-c-extras.cpp

This file was added.

				// RUN: %clang_analyze_cc1 \
				// RUN: -analyzer-checker=core,unix,alpha.security.cert.str.31c \
				// RUN: -verify %s

				// See the examples on the page of STR31-C:
				// https://wiki.sei.cmu.edu/confluence/x/sNUxBQ

				#include "../Inputs/system-header-simulator.h"

				void free(void *memblock);
				void *malloc(size_t size);

				// FIXME: The following tests could fail if the destination array has an offset,
				// which we do not model yet. Also we cannot reason about complex sized
				// allocations which makes this checker a little bit noisy.

				void test_known_size_cannot_overflow() {
				char dest[128];
				strcpy(dest, "src"); // no-warning
				}

				void test_everything_known() {
				char dest = (char )malloc(strlen("Foo") + 1);
				if (dest)
				strcpy(dest, "Foo"); // no-warning
				free(dest);
				}

				void test_complex_size_may_false_positive(const char *src) {
				char *dest;
				size_t size;
				size = strlen(src);

				dest = (char )malloc(size 2 + 2);

				strcpy(dest, &src[13]);
				// expected-warning@-1 {{'strcpy' could write outside of 'dest'}}
				free(dest);
				}

				void test_complex_size_wrong_size(const char *src) {
				char *dest;
				size_t size = strlen(src);

				// FIXME: '+ 2' missing so there is not enough space for the null terminator.
				dest = (char )malloc(2 size + 1);

				strcpy(dest, &src[13]);
				// expected-warning@-1 {{'strcpy' could write outside of 'dest'}}
				free(dest);
				}

				void test_char_by_char_may_false_positive(const char *src, size_t size) {
				const char *s;
				unsigned char c;
				char escape = (char )malloc(size);
				char *e = escape;
				for (s = src; (c = *s) != '\0'; ++s) {
				if (c == '\n') {
				strcpy(e, "\\n");
				// expected-warning@-1 {{'strcpy' could write outside of 'e'}}
				e += 2;
				} else {
				*e++ = c;
				}
				}

				*e = '\0';
				free(escape);
				}

				void test_format_string_read_max_value_single() {
				char buf[1];
				fscanf(stdin, "%2s", buf);
				// expected-warning@-1 {{'fscanf' could write outside of 'buf'}}
				}

				void test_format_string_read_max_value_multiple_not_string_specifier() {
				char d[3], y[4];
				unsigned m;
				fscanf(stdin, "%2s;%u;%4s;", d, &m, y);
				// expected-warning@-1 {{'fscanf' could write outside of 'y'}}
				}

				void test_format_string_write_max_value_single(const char *name) {
				char filename[128]; // 124 + '.txt' + '\0' = 129 characters
				sprintf(filename, "%.124s.txt", name);
				// expected-warning@-1 {{'sprintf' could write outside of 'filename'}}
				}

				void test_format_string_write_max_value_multiple_known_src(const char *name) {
				char filename[128];
				sprintf(filename, "%.123s.%s", name, "extension");
				// expected-warning@-1 {{'sprintf' could write outside of 'filename'}}
				}

clang/test/Analysis/cert/str31-c-notes.cpp

This file was added.

				// RUN: %clang_analyze_cc1 \
				// RUN: -analyzer-checker=core,unix,alpha.security.cert.str.31c \
				// RUN: -analyzer-output=text -verify %s

				// See the examples on the page of STR31-C:
				// https://wiki.sei.cmu.edu/confluence/x/sNUxBQ

				#include "../Inputs/system-header-simulator.h"

				void *malloc(size_t size);
				void free(void *memblock);

				void test_simple_size(unsigned size) {
				size = 13;

				char buf = (char )malloc(size);
				// expected-note@-1 {{Memory is allocated}}
				// expected-note@-2 {{'buf' initialized here}}

				if (gets(buf)) {
				// expected-note@-1 {{'gets' could write outside of 'buf'}}
				// expected-warning@-2 {{'gets' could write outside of 'buf'}}
				}
				NoQUnsubmitted Done Reply Inline Actions I still believe that this should be the warning in this code. This is already broken code. NoQ: I still believe that this should be the warning in this code. This is already broken code.
				CharussoAuthorUnsubmitted Done Reply Inline Actions No, it is not broken, sadly. I would say it is broken, but this checker tries to be smarter than `grep`. Truncation is fine for STR31-C. That change in my mindset made this checker possible when I have dropped the idea of warn on function calls. Charusso: No, it is not broken, sadly. I would say it is broken, but this checker tries to be smarter…
				NoQUnsubmitted Done Reply Inline Actions `Truncation is fine for STR31-C.` This isn't just truncation, this is buffer overflow, and it's definitely not fine if we want to "Guarantee that storage for strings has sufficient space for character data". This is absolutely the right place to simply warn on every invocation of `gets()`. NoQ: > `Truncation is fine for STR31-C.` This isn't just truncation, this is buffer overflow, and…
				CharussoAuthorUnsubmitted Done Reply Inline Actions I have found out we have another recommendation for that type of issues: https://wiki.sei.cmu.edu/confluence/display/c/STR03-C.+Do+not+inadvertently+truncate+a+string Here the truncation is a truncation, but in a security manner we cannot rely on feelings, so we warn from now. Charusso: I have found out we have another recommendation for that type of issues: https://wiki.sei.cmu.

				free(buf);
				}

				void test_strlen_without_null_terminator(const char *src) {
				NoQUnsubmitted Done Reply Inline Actions It is not wrong to free a buffer that isn't null-terminated. We shouldn't warn here. NoQ: It is not wrong to free a buffer that isn't null-terminated. We shouldn't warn here.
				CharussoAuthorUnsubmitted Done Reply Inline Actions That is when the string is read. We could have a list of function calls which are non-problematic. For now I am fine with that note, because you do not allocate something without reading it before freeing it. It is just an arbitrary test for now. Charusso: That is when the string is read. We could have a list of function calls which are non…
				NoQUnsubmitted Done Reply Inline Actions That is when the string is read. That strikes me as an example of the Maslow's hammer. The static analyzer isn't great at guessing whether the function reads from the string or not. For a dynamic analysis it wouldn't have been a problem. That said, maybe you could get away with discriminating between `char ` and `const char `. If the string is passed in by const pointer, there's nothing you can do except read from it. If it's passed in by non-const pointer, you can conservatively assume that it isn't read. NoQ: > That is when the string is read. That strikes me as an example of [[ https://en.wikipedia.
				CharussoAuthorUnsubmitted Done Reply Inline Actions Cool idea, thanks! Charusso: Cool idea, thanks!
				char *buff2;
				char *editor = getenv("EDITOR");
				if (editor != NULL) {
				// expected-note@-1 {{Assuming 'editor' is not equal to NULL}}
				// expected-note@-2 {{Taking true branch}}
				size_t len = strlen(editor);
				buff2 = (char *)malloc(len);
				// expected-note@-1 {{Memory is allocated}}
				// expected-note@-2 {{Value assigned to 'buff2'}}
				if (buff2 != NULL) {
				// expected-note@-1 {{Assuming 'buff2' is not equal to NULL}}
				// expected-note@-2 {{Taking true branch}}
				strcpy(buff2, editor);
				// expected-note@-1 {{'strcpy' could write outside of 'buff2'}}
				// expected-warning@-2 {{'strcpy' could write outside of 'buff2'}}
				}
				free(buff2);
				}
				}

clang/test/Analysis/cert/str31-c.cpp

This file was added.

				// RUN: %clang_analyze_cc1 \
				// RUN: -analyzer-checker=core,unix,alpha.security.cert.str.31c \
				// RUN: -verify %s

				// See the examples on the page of STR31-C:
				// https://wiki.sei.cmu.edu/confluence/x/sNUxBQ

				#include "../Inputs/system-header-simulator.h"

				typedef __SIZE_TYPE__ size_t;

				void free(void *memblock);
				void *malloc(size_t size);

				namespace test_gets_bad {
				#define BUFFER_SIZE 1024

				void func(void) {
				char buf[BUFFER_SIZE];
				if (gets(buf)) {
				// expected-warning@-1 {{'gets' could write outside of 'buf'}}
				}
				}
				} // namespace test_gets_bad

				namespace test_gets_good {
				enum { BUFFERSIZE = 32 };

				baloghadamsoftwareUnsubmitted Done Reply Inline Actions Why do we need a different buffer size here? And why `enum`? baloghadamsoftware: Why do we need a different buffer size here? And why `enum`?
				CharussoAuthorUnsubmitted Done Reply Inline Actions I have copied the examples of the rule STR31-C as-is. I like the variety of buffer sizes. Charusso: I have copied the examples of the rule STR31-C as-is. I like the variety of buffer sizes.
				void func(void) {
				char buff[BUFFERSIZE];

				if (fgets(buff, sizeof(buff), stdin)) {
				}
				}
				} // namespace test_gets_good

				namespace test_sprintf_bad {
				void func(const char *name) {
				char buf[128];
				sprintf(buf, "%s.txt", name);
				// expected-warning@-1 {{'sprintf' could write outside of 'buf'}}
				}
				} // namespace test_sprintf_bad

				namespace test_sprintf_good {
				void func(const char *name) {
				char buff[128];
				snprintf(buff, sizeof(buff), "%s.txt", name);
				}
				} // namespace test_sprintf_good

				namespace test_fscanf_bad {
				enum { BUF_LENGTH = 1024 };

				void get_data(void) {
				char buf[BUF_LENGTH];
				fscanf(stdin, "%s", buf);
				// expected-warning@-1 {{'fscanf' could write outside of 'buf'}}
				}
				} // namespace test_fscanf_bad

				namespace test_fscanf_good {
				enum { BUF_LENGTH = 1024 };

				void get_data(void) {
				char buff[BUF_LENGTH];
				fscanf(stdin, "%1023s", buff);
				}
				} // namespace test_fscanf_good

				namespace test_strcpy_bad {
				int main(int argc, char *argv[]) {
				const char *const name = (argc && argv[0]) ? argv[0] : "";
				char prog_name[128];
				strcpy(prog_name, name);
				// expected-warning@-1 {{'strcpy' could write outside of 'prog_name'}}
				return 0;
				}

				void func(void) {
				char buff[256];
				char *editor = getenv("EDITOR");
				if (editor != NULL) {
				strcpy(buff, editor);
				// expected-warning@-1 {{'strcpy' could write outside of 'buff'}}
				}
				}
				} // namespace test_strcpy_bad

				namespace test_strcpy_good {
				int main(int argc, char *argv[]) {
				const char *const name = (argc && argv[0]) ? argv[0] : "";
				char prog_name2 = (char )malloc(strlen(name) + 1);
				if (prog_name2 != NULL) {
				strcpy(prog_name2, name);
				}

				free(prog_name2);
				return 0;
				}

				void func(void) {
				char *buff2;
				char *editor = getenv("EDITOR");
				if (editor != NULL) {
				size_t len = strlen(editor) + 1;
				buff2 = (char *)malloc(len);
				if (buff2 != NULL) {
				strcpy(buff2, editor);
				}

				free(buff2);
				}
				}
				} // namespace test_strcpy_good

				//===----------------------------------------------------------------------===//
				// The following are from the rule's page which we do not handle yet.
				balazskeUnsubmitted Done Reply Inline Actions Do we get a warning if the `+1` above is missing? balazske: Do we get a warning if the `+1` above is missing?
				CharussoAuthorUnsubmitted Done Reply Inline Actions Test added. Charusso: Test added.
				//===----------------------------------------------------------------------===//

				namespace test_loop_index_bad {
				void copy(size_t n, char src[n], char dest[n]) {
				size_t i;

				for (i = 0; src[i] && (i < n); ++i) {
				dest[i] = src[i];
				}
				dest[i] = '\0';
				}
				} // namespace test_loop_index_bad

				namespace test_loop_index_good {
				void copy(size_t n, char src[n], char dest[n]) {
				size_t i;

				for (i = 0; src[i] && (i < n - 1); ++i) {
				dest[i] = src[i];
				}
				dest[i] = '\0';
				}
				} // namespace test_loop_index_good

				namespace test_getchar_bad {
				enum { BUFFERSIZE = 32 };

				void func(void) {
				char buf[BUFFERSIZE];
				char *p;
				int ch;
				p = buf;
				while ((ch = getchar()) != '\n' && ch != EOF) {
				*p++ = (char)ch;
				}
				*p++ = 0;
				if (ch == EOF) {
				/* Handle EOF or error */
				}
				}
				} // namespace test_getchar_bad

				namespace test_getchar_good {
				enum { BUFFERSIZE = 32 };

				void func(void) {
				char buf[BUFFERSIZE];
				int ch;
				size_t index = 0;
				size_t chars_read = 0;

				while ((ch = getchar()) != '\n' && ch != EOF) {
				if (index < sizeof(buf) - 1) {
				buf[index++] = (char)ch;
				}
				chars_read++;
				}
				buf[index] = '\0'; /* Terminate string */
				if (ch == EOF) {
				/* Handle EOF or error */
				}
				if (chars_read > index) {
				/* Handle truncation */
				}
				}
				} // namespace test_getchar_good