Page MenuHomePhabricator

[analyzer] CERT: STR31-C
Needs ReviewPublic

Authored by Charusso on Nov 18 2019, 3:07 PM.

Details

Reviewers
NoQ
Summary

This patch introduces a new checker:
alpha.security.cert.str.31c

This checker is implemented based on the following rule:
https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator

It warns on misusing the following functions:
strcpy(), gets(), fscanf(), sprintf().

It also has an option on its base checker:
alpha.security.cert.str.StrCheckerBase:WarnOnCall=true

This boolean option is on by default and stands for the following:

Whether the checker needs to warn on the bugprone function
calls. It is a common idiom to null-terminate by hand after
the insecure function call which is easy to misuse so that
it is on by default. If it is off that means we look for a
hand-written null-termination before the string is possibly
read and if we find one we do not report the insecure call.

Diff Detail

Event Timeline

Charusso created this revision.Nov 18 2019, 3:07 PM
Charusso marked an inline comment as done.EditedNov 18 2019, 3:16 PM

Somehow the MallocBugVisitor is broken in the wild - it does not catch the allocation, and we need to support a lot more to call it non-alpha (see the FIXMEs). I do not want to spend time on fixing the visitor, but rather I would make this checker alpha, and move on. After a while when we have enough alarms we could see what could go wrong. The reports are not bad, I tried to make it super false-positive free.

@aaron.ballman the name became security.cert.str.31.c and we lack the support to invoke every C checker with str.*.c (as I know), but for now it should be fine.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h
18

That change is totally unrelated.

NoQ added inline comments.Nov 19 2019, 3:58 PM
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
280

Why is this a false positive?

You're bringing in a completely brand-new machinery here, could you explain how it works and why do you need it?

Somehow the MallocBugVisitor is broken in the wild - it does not catch the allocation, and we need to support a lot more to call it non-alpha (see the FIXMEs). I do not want to spend time on fixing the visitor, but rather I would make this checker alpha, and move on. After a while when we have enough alarms we could see what could go wrong. The reports are not bad, I tried to make it super false-positive free.

@aaron.ballman the name became security.cert.str.31.c and we lack the support to invoke every C checker with str.*.c (as I know), but for now it should be fine.

The name seems reasonable enough to me -- it at least has all the relevant information in it, which is a great start.

NoQ added a comment.Nov 20 2019, 5:01 PM

I think it would really help if you draw a state machine for the checker, like the ASCII-art thing in D70470; you don't need to spend a lot of time turning it into ASCII-art, a photo of a quick hand-drawn picture would be totally fine, because it's, first and foremost, for discussion :)

Charusso updated this revision to Diff 230354.Nov 20 2019, 5:06 PM
Charusso marked 2 inline comments as done.
Charusso retitled this revision from [analyzer][WIP] StrChecker: 31.c to [analyzer][WIP] CERT: StrChecker: 31.c.
  • Added a report when the not null-terminated string is read.
  • Added comments.
  • Fixed some uninteresting nits.
  • No ASCII-art yet.
Charusso added a comment.EditedNov 20 2019, 5:14 PM
In D70411#1754356, @NoQ wrote:

I think it would really help if you draw a state machine for the checker, like the ASCII-art thing in D70470; you don't need to spend a lot of time turning it into ASCII-art, a photo of a quick hand-drawn picture would be totally fine, because it's, first and foremost, for discussion :)

Hm, the idea is cool, but my checker is not that complex, given that now I have added comments. Thanks, I will adjust the comments with some kind of drawing.

My main problem was that to create a NoteTag when the not null-terminated string is read, but it is after when we emit an error, so I could not emit a note. That is why it emits two different reports, and somehow I need to convert the function call evaluation warning to a note in case when a not null-terminated string is read. Do we have any plans with NoteTags to support the craziest checkers? What do you think about storing the reports in the program state?

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
280

Hm, yes, I wanted to add comments earlier, sorry. It is still wonky a little-bit, somehow I need to merge the two different errors.

Charusso added inline comments.Nov 20 2019, 5:21 PM
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
171

We can do the opposite to see whether the destination array's type is an array, so that we do not need getDynamicSizeExpr, yay.

Charusso updated this revision to Diff 230702.Nov 22 2019, 11:55 AM
Charusso marked an inline comment as done.
Charusso retitled this revision from [analyzer][WIP] CERT: StrChecker: 31.c to [analyzer] CERT: StrChecker: 31.c.
  • Remove the report storing map so we do not traverse backwards on the bug-path.
  • Use NoteTags instead of reports on problematic function calls.
  • Emit a report only if a not null-terminated string is read.
  • Store whether the string is null-terminated.

Now I have simplified the checker so we do not need any ASCII-art, I believe. Do we have any better logic than the current implementation to catch when the string is read?

Charusso updated this revision to Diff 231341.EditedNov 27 2019, 5:56 PM
  • Do not emit a report on re-binding a not null-terminated string, it may will be properly terminated (test added).
  • Catch the null-termination on symbolic level in checkBind().
  • Added a test for the necessity of SValVisitor in this checker.
  • Some refactor.
Charusso marked 2 inline comments as done.Nov 27 2019, 6:22 PM

The false positive suppression by going backwards on the bug-path of stored reports was a very simple concept, which turned out to be a useless one. The rough idea was that to invalidate every report in a path, and every other report would be left because they are reachable from a different path and they are valid errors. So that:

void test_wonky_null_termination(const char *src) {
  char dest[128];
  strcpy(dest, src);
  dest[sizeof(dest) - 1] = '\0';
  do_something(dest); // no-warning
}

The above example null-terminates the string on every path, so we remove the report, and:

void test_may_not_every_path_null_terminated(const char *src) {
  char dest[128];
  strcpy(dest, src);
  if (strlen(src) > sizeof(dest)) {
    dest[sizeof(dest) - 1] = '\0';
  }
  do_something(dest);
  // expected-warning@-1 {{'dest' is not null-terminated}}
}

We say that the above example has a path where we cannot invalidate the report. Invalidating one single control-flow path I think should be safe and keeps every other report like the above examples. In general there is no such simple case exist because we check for the destination array being null so we encounter a state-split whatever we do. Also may it was a bad idea to rewrite the BugReporter for a simple checker.

This null-termination-by-hand made false positives because even the buffer could overflow, we stop up the flow, so the Vasa will not sink. I wanted to emit reports on problematic function calls, but changing the main logic to emit an error on the string-reading made the null-termination store-able. So that we could drop this backward path-traversing logic, that is why I have not answered yet. However, if we still want to emit an error on function-calls, and every of them, we need to be able to chain the calls to the same region, which made by the reports in the previous revisions. Also may we would use this report-storing idea later.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
801

In my mind the documentation is the CERT rule's page. Is it okay?

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
15 ↗(On Diff #231341)

Facepalm.

Charusso updated this revision to Diff 232198.Dec 4 2019, 1:52 PM
  • Tweaking.
Charusso added a comment.EditedDec 4 2019, 1:57 PM

I have picked curl as an example, because it has a measurable amount of reports. I still recommend to make this alpha, because the expression-tracking cannot track members, and peoples are using members everywhere.

  • It is made with CodeChecker:
--analyzers clangsa \
--disable default \
--enable core \
--enable unix \
--enable security.cert.str.StrCheckerBase \
--enable security.cert.str.31.c \
--saargs sa_args.txt
$ cat sa_args.txt

-Xclang -analyzer-config -Xclang silence-checkers="core;unix"
NoQ added a comment.Dec 4 2019, 3:31 PM

A while back I had a look into implementing some of the CERT checkers and my vague recollection from these attempts was that it's a bad idea to take CERT examples literally. Most of the CERT rules are explained in an informal natural language and it makes CERT great for demonstrating common mistakes that programmers make when they write in C/C++. Humans can easily understand the problem by reading CERT pages. But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!". This way your check does indeed pass the tests, but it doesn't make sense both from the rule's perspective (the rule never said that it's wrong to strcpy() into a fixed-size buffer, it only said that you should anyhow guarantee that the storage is sufficient) and from the user's perspective (you won't be able to reduce the gap between false positives and false negatives enough for the check to be useful, even with the subsequent whack-a-mole of adding more heuristics, because what the check is checking is relatively orthogonal to the actual problem you're trying to solve).

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
801

I'd prefer our own documentation that may link to the CERT page but should also explain, say, which parts of the rule are covered by the checker (even if it's "all of them", they may add more rules in the future), and probably some specifics of the checker's behavior if they aren't obvious from the rule.

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
33–34 ↗(On Diff #232198)

Mmm, no, it's not fine. The warning is saying something that's not correct. Even if src is not null-terminated, under the assumption that no out-of-bounds read UB occurs in the strcpy() call, dest is going to always be null-terminated and have a strlen of at most 127. And by continuing our analysis after strcpy(), we inform the user that we assume that no UB has happened on the current execution path yet.

Traditionally checkers either warn immediately when they can detect an error, or assume that the error has not happened. For example, when we dereference a pointer, if the pointer is not known to be null, we actively assume that it's non-null. Similarly, if the src string is not null-terminated, we should warn at the invocation of strcpy; if it's not known whether it's null-terminated or not, we should actively assume that it's null-terminated at strcpy.

Also, I usually don't agree with the statements like "This report helped us find a real bug, therefore it's a true positive". You can find a bug in literally any code if you stare at it long enough! I'm glad you found the bug anyway, but if the report is not pointing at the problem directly, we're not doing a great job.

51–53 ↗(On Diff #232198)

Looks like a false positive. Consider the following perfectly correct (even if a bit inefficient) code that only differs from the current code only in names and context:

void copy_and_test_if_short(const char *src) {
  char dest[128];
  strcpy(dest, src);
  warn_if_starts_with_foo(dest);
}

void copy_and_test(const char *src) {
  if (strlen(src) < 64)
    copy_and_test_if_short(src);
  else
    copy_and_test_if_long(src);
}
clang/test/Analysis/cert/str31-c-notes.cpp
22 ↗(On Diff #232198)

I still believe that this should be *the* warning in this code. This is already broken code.

26–27 ↗(On Diff #232198)

It is not wrong to free a buffer that isn't null-terminated. We shouldn't warn here.

NoQ added a comment.Dec 4 2019, 3:45 PM

These reports seem to confirm my point. I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways. You most likely won't be able to enumerate all possible ways in which the user can ensure that the destination buffer size is sufficient.


The source string is taken from a global table and therefore has known maximum size.


The source string is an IP address and port, which has a known limit on the number of digits it can have.


The source string is a literal '*'.


There's an if-statement that checks that the storage size is sufficient.


The source string is a token (obtained via strtok) of a string that has the same size as the destination buffer.

Charusso updated this revision to Diff 232671.Dec 6 2019, 4:10 PM
Charusso marked 10 inline comments as done.
  • Make the checker alpha.
  • Add docs.
  • Copy-paste @NoQ's test.

Please let us measure your examples at first to have a consensus what we would like to see from this checker.

In D70411#1769803, @NoQ wrote:

These reports seem to confirm my point. I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways. You most likely won't be able to enumerate all possible ways in which the user can ensure that the destination buffer size is sufficient.

I believe I can enhance the checker with the help of NoteTags and give more information for the user why the buffer space is Not sufficient. The size property's upper-bound is not that difficult to measure, I think we can implement this. For now, let me analyze each report.



The source string is taken from a global table and therefore has known maximum size.

The source string is a literal '*'.

When you encounter a member that is when the hand-waving begins. I have made a dump to see what we have. I have not seen any immediate way to connect the dynamic size and the members. For now we have that:
The source region is:
SymRegion{reg_$36<const char * SymRegion{derived_$35{conj_$32{int, LC7, S161233, #1},Element{encodings,0 S64b,const struct content_encoding_s *}}}.name>}
and the allocation's dynamic size is:
extent_$41{SymRegion{conj_$34{void *, LC7, S161233, #1}}}

My assumption is that, the size of a member is very arbitrary and we cannot track the member's allocation, so we even could drop every of these reports as being useless. Because I make this checker alpha I am fine with that assumption of members, and we definitely need better ideas.



The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated. For example the VIM project has one single global macro to serve every allocation's size, and that is why the CERT rule made. If the attacker passes a 10k characters long string and you allocate 10k+1 characters for the '\0' you can work with the data safely. Of course you would need to check some upper bound, but that is another story. The key is, any arbitrary buffer size could be a problem. Here:

954 | size_t addrlen = INET6_ADDRSTRLEN > strlen(string_ftpport) // Assuming the condition is true
                       ? INET6_ADDRSTRLEN
                       : strlen(string_ftpport)

addrlen is an arbitrary macro size INET6_ADDRSTRLEN, which not necessarily could hold string_ftpport in strcpy(addr, string_ftpport);. As a human-being you know the most appropriate size, yes. But the program could be ill-formed very easily so we warn on the arbitrary-length path. Here you do not see something like addr[INET6_ADDRSTRLEN - 1] = '\0' so when the string is read you cannot be sure whether it is null-terminated. If line 954 would be evaluated to false, allocating calloc(addrlen + 1, 1) so when addrlen is strlen(string_ftpport) is fine of course, that is the appropriate size to store string_ftpport.



There's an if-statement that checks that the storage size is sufficient.

Well, this is the most suspicious report of all the time. Let me reconstruct what is going on:

513 | sbytes += tftp_option_add(state, sbytes,
                                (char *)state->spacket.data + sbytes, // warning
                                TFTP_OPTION_TSIZE);

My guess was that to change the check::PostCall to check::PreCall, but if I remember right the same report occurred. Here we return from the function call tftp_option_add, then we warn on its argument which already has been passed to the call. We should not warn when the array is passed to a function which we already have finished reading from. I have not seen any immediate solution, so I count this as an internal bug feature.



The source string is a token (obtained via strtok) of a string that has the same size as the destination buffer.

Let me simplify the report as there is no such feature:

111 | outlen = strlen(outfile);
112 | outdup = strdup(outfile);
116 | dirbuildup = malloc(outlen + 1);

125 | tempdir = strtok(outdup, PATH_DELIMITERS);
136 | if(outdup == tempdir)
138 |   strcpy(dirbuildup, tempdir);

Here we see both tempdir's and dirbuildup's size is based on strlen(outfile) but strtok() strikes in and we conjure a new symbol for dirbuildup so the connection is lost unfortunately.



Please let us brainstorming here.

In D70411#1769786, @NoQ wrote:

A while back I had a look into implementing some of the CERT checkers and my vague recollection from these attempts was that it's a bad idea to take CERT examples literally. Most of the CERT rules are explained in an informal natural language and it makes CERT great for demonstrating common mistakes that programmers make when they write in C/C++. Humans can easily understand the problem by reading CERT pages. But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

Well, if the member's allocation and the expression-value-tracking would not be hand-waving, such a simple checker could serve a lot. I see your point and you are right the rules are not made for tools. Our tool already knows most of the functionality what is needed for the STR rules. It is started as an experiment and it is working enough well, in my point of view. The member-value-tracking and the lack of standard function modeling of course affects this checker, but that issue affects every other checker as well. If we call them non-alpha, it could be even non-alpha.

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!".

Just for clarification, I made the entire MemoryBlockRegion stuff because I do not care how do you allocate a memory block (even a string is counted as that).

This way your check does indeed pass the tests, but it doesn't make sense both from the rule's perspective (the rule never said that it's wrong to strcpy() into a fixed-size buffer, it only said that you should anyhow guarantee that the storage is sufficient) and from the user's perspective (you won't be able to reduce the gap between false positives and false negatives enough for the check to be useful, even with the subsequent whack-a-mole of adding more heuristics, because what the check is checking is relatively orthogonal to the actual problem you're trying to solve).

There is not that much heuristic:

  • If you use the unsafe calls and before reading the string you manage to null-terminate it (inject '\0'), it would be a false positive.
  • If you have a blob of memory without considering the length of the stuff it will store, it is a true positive.

Please note that, truncation is fine:

To prevent such errors, either limit copies through truncation or, preferably, ensure that the destination is of sufficient size

~ from the rule's page

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

This checker indeed would require an additional taint analysis, but as we do not have a non-alpha variant, I am fine to call every non-null-terminated string reading an issue (as it is), whatever is the source. That is a generalization, yes, in the best possible manner what we have got today. Also I am not a big fan of relying on non-alpha stuff and I am sure I have defined my checker enough well to catch real issues.



Let me summarize:

  • There is a lot more to do, we have seen too many issues to call this a finished checker.
  • The problem is not the size, but the missing '\0' (which you can have multiple of at any point).
  • We heavily need to swap the value-tracking with NoteTags to make this useful.
clang/test/Analysis/cert/str31-c-fp-suppression.cpp
33–34 ↗(On Diff #232198)

Well, I think we could check the range info and catch if we have an infeasible case. For now on one of the paths it looks like the string is not null-terminated, where we warn. Of course we need better modeling, that is why I have stated out with a FIXME.

I totally agree with hand-waving "true positives", but this is very unlikely to happen, I just came out with that example to state out the CStringModelingChecker is not enough in its current shape.

51–53 ↗(On Diff #232198)

That is why I have asked whether we have a way to say "when the string is read". I like that heuristic for now, because any kind of not null-terminated string is the root of the evil for STR31-C.

In this case the attacker is the programmer who sends a possibly not null-terminated string around and the function which receives an arbitrary input would carry the issue.

Thanks for the example, I wanted to cover ranges, but I have forgotten it, because in the wild peoples do not check for bounds. Copy-pasted.

clang/test/Analysis/cert/str31-c-notes.cpp
22 ↗(On Diff #232198)

No, it is not broken, sadly. I would say it is broken, but this checker tries to be smarter than grep. Truncation is fine for STR31-C.

That change in my mindset made this checker possible when I have dropped the idea of warn on function calls.

26–27 ↗(On Diff #232198)

That is when the string is read. We could have a list of function calls which are non-problematic. For now I am fine with that note, because you do not allocate something without reading it before freeing it. It is just an arbitrary test for now.

Charusso edited the summary of this revision. (Show Details)Dec 6 2019, 4:15 PM
NoQ added a comment.Dec 9 2019, 5:56 PM

Ok, so, what should the checker warn on? What should be the intended state machine for this checker?

We basically have two possible categories of answers to this question:

  • "The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.
  • Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.
  • The problem is not the size, but the missing '\0' (which you can have multiple of at any point).

The caption of the CERT rule suggests that both of these are problematic.

In D70411#1769786, @NoQ wrote:

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!".

Just for clarification, I made the entire MemoryBlockRegion stuff because I do not care how do you allocate a memory block (even a string is counted as that).

Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.


The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated.

It does, because strcpy is guaranteed to null-terminate as long as it has enough storage capacity.

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

This checker indeed would require an additional taint analysis, but as we do not have a non-alpha variant, I am fine to call every non-null-terminated string reading an issue (as it is), whatever is the source. That is a generalization, yes, in the best possible manner what we have got today. Also I am not a big fan of relying on non-alpha stuff and I am sure I have defined my checker enough well to catch real issues.

Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

  • We heavily need to swap the value-tracking with NoteTags to make this useful.

Before i forget, i'm opposed to explaining problems as notes. We should emit a warning as soon as we've reached far enough on the execution path to conclude that there's a problem.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
798

Maybe 31c? I'm afraid a dot in package name would be confusing and people will actually try something like to enable/disable cert.str.31.

803

alpha.cert.str.

clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
23

These strings are visible to humans, please write some proper English :)

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
51–53 ↗(On Diff #232198)

Note that function copy_and_test in my example would not be necessarily visible to you. It may be in another translation unit.

So in order to "cover ranges" you'll have to discard all cases where the length is completely unknown.

clang/test/Analysis/cert/str31-c-notes.cpp
22 ↗(On Diff #232198)

Truncation is fine for STR31-C.

This isn't just truncation, this is buffer overflow, and it's definitely not fine if we want to "Guarantee that storage for strings has sufficient space for character data".

This is absolutely the right place to simply warn on every invocation of gets().

26–27 ↗(On Diff #232198)

That is when the string is read.

That strikes me as an example of the Maslow's hammer. The static analyzer isn't great at guessing whether the function reads from the string or not. For a dynamic analysis it wouldn't have been a problem.

That said, maybe you could get away with discriminating between char * and const char *. If the string is passed in by const pointer, there's nothing you can do except read from it. If it's passed in by non-const pointer, you can conservatively assume that it isn't read.

Charusso updated this revision to Diff 233252.Dec 10 2019, 7:47 PM
Charusso marked 7 inline comments as done.
Charusso retitled this revision from [analyzer] CERT: StrChecker: 31.c to [analyzer] CERT: StrChecker: Implementing most of the STR31-C.
Charusso edited the summary of this revision. (Show Details)
  • Fix.
In D70411#1776460, @NoQ wrote:

Ok, so, what should the checker warn on? What should be the intended state machine for this checker?

We basically have two possible categories of answers to this question:

  • "The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.
  • Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.

If I am right you are thinking about warn on the unsafe function calls which is the first case. I did that on by default. The idea is that, in a security manner we cannot allow hand-waving fixes. To measure stuff we could introduce options which serves the second case. I have added an option to suppress that common null-termination-by-hand idiom, so that I do not recommend anything. The CERT rules are not recommendations so I think now the warning is fine.

Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.

I see, and you are right, but we are getting closer and closer to catch the most of.


The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated.

It does, because strcpy is guaranteed to null-terminate as long as it has enough storage capacity.

It does not, because the storage capacity is arbitrary. If we would be sure the copied stuff's length cannot be larger than the destination's arbitrary capacity, we would not warn. There are tests for that case.

Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

I think it is too far away, sadly. I like the idea because it would be the root of security checkers, but I am mostly interested in data-flow analysis.

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
51–53 ↗(On Diff #232198)

That unknowness idea is cool, I will leave that comment as not Done.

clang/test/Analysis/cert/str31-c-notes.cpp
22 ↗(On Diff #232198)

I have found out we have another recommendation for that type of issues:
https://wiki.sei.cmu.edu/confluence/display/c/STR03-C.+Do+not+inadvertently+truncate+a+string

Here the truncation is a truncation, but in a security manner we cannot rely on feelings, so we warn from now.

26–27 ↗(On Diff #232198)

Cool idea, thanks!

Charusso updated this revision to Diff 233903.EditedDec 13 2019, 6:42 PM
Charusso retitled this revision from [analyzer] CERT: StrChecker: Implementing most of the STR31-C to [analyzer] CERT: STR31-C.
  • Iterate over parameters rather arguments for searching string-reading.
  • Better notes of the argument's index.
  • Avoid C.getPredecessor(), use the error-node instead.