Page MenuHomePhabricator

[analyzer] CERT STR rule checkers: STR31-C
Needs ReviewPublic

Authored by Charusso on Nov 18 2019, 3:07 PM.

Details

Reviewers
NoQ
Summary

This patch introduces a new experimental checker:
alpha.security.cert.str.31c

This checker is implemented based on the following rule:
https://wiki.sei.cmu.edu/confluence/display/c/STR31-C.+Guarantee+that+storage+for+strings+has+sufficient+space+for+character+data+and+the+null+terminator

It warns on misusing the following functions:
strcpy(), gets(), fscanf(), sprintf().

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Now I have simplified the checker so we do not need any ASCII-art, I believe. Do we have any better logic than the current implementation to catch when the string is read?

Charusso updated this revision to Diff 231341.EditedNov 27 2019, 5:56 PM
  • Do not emit a report on re-binding a not null-terminated string, it may will be properly terminated (test added).
  • Catch the null-termination on symbolic level in checkBind().
  • Added a test for the necessity of SValVisitor in this checker.
  • Some refactor.
Charusso marked 2 inline comments as done.Nov 27 2019, 6:22 PM

The false positive suppression by going backwards on the bug-path of stored reports was a very simple concept, which turned out to be a useless one. The rough idea was that to invalidate every report in a path, and every other report would be left because they are reachable from a different path and they are valid errors. So that:

void test_wonky_null_termination(const char *src) {
  char dest[128];
  strcpy(dest, src);
  dest[sizeof(dest) - 1] = '\0';
  do_something(dest); // no-warning
}

The above example null-terminates the string on every path, so we remove the report, and:

void test_may_not_every_path_null_terminated(const char *src) {
  char dest[128];
  strcpy(dest, src);
  if (strlen(src) > sizeof(dest)) {
    dest[sizeof(dest) - 1] = '\0';
  }
  do_something(dest);
  // expected-warning@-1 {{'dest' is not null-terminated}}
}

We say that the above example has a path where we cannot invalidate the report. Invalidating one single control-flow path I think should be safe and keeps every other report like the above examples. In general there is no such simple case exist because we check for the destination array being null so we encounter a state-split whatever we do. Also may it was a bad idea to rewrite the BugReporter for a simple checker.

This null-termination-by-hand made false positives because even the buffer could overflow, we stop up the flow, so the Vasa will not sink. I wanted to emit reports on problematic function calls, but changing the main logic to emit an error on the string-reading made the null-termination store-able. So that we could drop this backward path-traversing logic, that is why I have not answered yet. However, if we still want to emit an error on function-calls, and every of them, we need to be able to chain the calls to the same region, which made by the reports in the previous revisions. Also may we would use this report-storing idea later.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
853

In my mind the documentation is the CERT rule's page. Is it okay?

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
15 ↗(On Diff #231341)

Facepalm.

Charusso updated this revision to Diff 232198.Dec 4 2019, 1:52 PM
  • Tweaking.
Charusso added a comment.EditedDec 4 2019, 1:57 PM

I have picked curl as an example, because it has a measurable amount of reports. I still recommend to make this alpha, because the expression-tracking cannot track members, and peoples are using members everywhere.

  • It is made with CodeChecker:
--analyzers clangsa \
--disable default \
--enable core \
--enable unix \
--enable security.cert.str.StrCheckerBase \
--enable security.cert.str.31.c \
--saargs sa_args.txt
$ cat sa_args.txt

-Xclang -analyzer-config -Xclang silence-checkers="core;unix"
NoQ added a comment.Dec 4 2019, 3:31 PM

A while back I had a look into implementing some of the CERT checkers and my vague recollection from these attempts was that it's a bad idea to take CERT examples literally. Most of the CERT rules are explained in an informal natural language and it makes CERT great for demonstrating common mistakes that programmers make when they write in C/C++. Humans can easily understand the problem by reading CERT pages. But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!". This way your check does indeed pass the tests, but it doesn't make sense both from the rule's perspective (the rule never said that it's wrong to strcpy() into a fixed-size buffer, it only said that you should anyhow guarantee that the storage is sufficient) and from the user's perspective (you won't be able to reduce the gap between false positives and false negatives enough for the check to be useful, even with the subsequent whack-a-mole of adding more heuristics, because what the check is checking is relatively orthogonal to the actual problem you're trying to solve).

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
853

I'd prefer our own documentation that may link to the CERT page but should also explain, say, which parts of the rule are covered by the checker (even if it's "all of them", they may add more rules in the future), and probably some specifics of the checker's behavior if they aren't obvious from the rule.

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
33–34 ↗(On Diff #232198)

Mmm, no, it's not fine. The warning is saying something that's not correct. Even if src is not null-terminated, under the assumption that no out-of-bounds read UB occurs in the strcpy() call, dest is going to always be null-terminated and have a strlen of at most 127. And by continuing our analysis after strcpy(), we inform the user that we assume that no UB has happened on the current execution path yet.

Traditionally checkers either warn immediately when they can detect an error, or assume that the error has not happened. For example, when we dereference a pointer, if the pointer is not known to be null, we actively assume that it's non-null. Similarly, if the src string is not null-terminated, we should warn at the invocation of strcpy; if it's not known whether it's null-terminated or not, we should actively assume that it's null-terminated at strcpy.

Also, I usually don't agree with the statements like "This report helped us find a real bug, therefore it's a true positive". You can find a bug in literally any code if you stare at it long enough! I'm glad you found the bug anyway, but if the report is not pointing at the problem directly, we're not doing a great job.

51–53 ↗(On Diff #232198)

Looks like a false positive. Consider the following perfectly correct (even if a bit inefficient) code that only differs from the current code only in names and context:

void copy_and_test_if_short(const char *src) {
  char dest[128];
  strcpy(dest, src);
  warn_if_starts_with_foo(dest);
}

void copy_and_test(const char *src) {
  if (strlen(src) < 64)
    copy_and_test_if_short(src);
  else
    copy_and_test_if_long(src);
}
clang/test/Analysis/cert/str31-c-notes.cpp
23

I still believe that this should be *the* warning in this code. This is already broken code.

27–28

It is not wrong to free a buffer that isn't null-terminated. We shouldn't warn here.

NoQ added a comment.Dec 4 2019, 3:45 PM

These reports seem to confirm my point. I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways. You most likely won't be able to enumerate all possible ways in which the user can ensure that the destination buffer size is sufficient.


The source string is taken from a global table and therefore has known maximum size.


The source string is an IP address and port, which has a known limit on the number of digits it can have.


The source string is a literal '*'.


There's an if-statement that checks that the storage size is sufficient.


The source string is a token (obtained via strtok) of a string that has the same size as the destination buffer.

Charusso updated this revision to Diff 232671.Dec 6 2019, 4:10 PM
Charusso marked 10 inline comments as done.
  • Make the checker alpha.
  • Add docs.
  • Copy-paste @NoQ's test.

Please let us measure your examples at first to have a consensus what we would like to see from this checker.

In D70411#1769803, @NoQ wrote:

These reports seem to confirm my point. I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways. You most likely won't be able to enumerate all possible ways in which the user can ensure that the destination buffer size is sufficient.

I believe I can enhance the checker with the help of NoteTags and give more information for the user why the buffer space is Not sufficient. The size property's upper-bound is not that difficult to measure, I think we can implement this. For now, let me analyze each report.



The source string is taken from a global table and therefore has known maximum size.

The source string is a literal '*'.

When you encounter a member that is when the hand-waving begins. I have made a dump to see what we have. I have not seen any immediate way to connect the dynamic size and the members. For now we have that:
The source region is:
SymRegion{reg_$36<const char * SymRegion{derived_$35{conj_$32{int, LC7, S161233, #1},Element{encodings,0 S64b,const struct content_encoding_s *}}}.name>}
and the allocation's dynamic size is:
extent_$41{SymRegion{conj_$34{void *, LC7, S161233, #1}}}

My assumption is that, the size of a member is very arbitrary and we cannot track the member's allocation, so we even could drop every of these reports as being useless. Because I make this checker alpha I am fine with that assumption of members, and we definitely need better ideas.



The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated. For example the VIM project has one single global macro to serve every allocation's size, and that is why the CERT rule made. If the attacker passes a 10k characters long string and you allocate 10k+1 characters for the '\0' you can work with the data safely. Of course you would need to check some upper bound, but that is another story. The key is, any arbitrary buffer size could be a problem. Here:

954 | size_t addrlen = INET6_ADDRSTRLEN > strlen(string_ftpport) // Assuming the condition is true
                       ? INET6_ADDRSTRLEN
                       : strlen(string_ftpport)

addrlen is an arbitrary macro size INET6_ADDRSTRLEN, which not necessarily could hold string_ftpport in strcpy(addr, string_ftpport);. As a human-being you know the most appropriate size, yes. But the program could be ill-formed very easily so we warn on the arbitrary-length path. Here you do not see something like addr[INET6_ADDRSTRLEN - 1] = '\0' so when the string is read you cannot be sure whether it is null-terminated. If line 954 would be evaluated to false, allocating calloc(addrlen + 1, 1) so when addrlen is strlen(string_ftpport) is fine of course, that is the appropriate size to store string_ftpport.



There's an if-statement that checks that the storage size is sufficient.

Well, this is the most suspicious report of all the time. Let me reconstruct what is going on:

513 | sbytes += tftp_option_add(state, sbytes,
                                (char *)state->spacket.data + sbytes, // warning
                                TFTP_OPTION_TSIZE);

My guess was that to change the check::PostCall to check::PreCall, but if I remember right the same report occurred. Here we return from the function call tftp_option_add, then we warn on its argument which already has been passed to the call. We should not warn when the array is passed to a function which we already have finished reading from. I have not seen any immediate solution, so I count this as an internal bug feature.



The source string is a token (obtained via strtok) of a string that has the same size as the destination buffer.

Let me simplify the report as there is no such feature:

111 | outlen = strlen(outfile);
112 | outdup = strdup(outfile);
116 | dirbuildup = malloc(outlen + 1);

125 | tempdir = strtok(outdup, PATH_DELIMITERS);
136 | if(outdup == tempdir)
138 |   strcpy(dirbuildup, tempdir);

Here we see both tempdir's and dirbuildup's size is based on strlen(outfile) but strtok() strikes in and we conjure a new symbol for dirbuildup so the connection is lost unfortunately.



Please let us brainstorming here.

In D70411#1769786, @NoQ wrote:

A while back I had a look into implementing some of the CERT checkers and my vague recollection from these attempts was that it's a bad idea to take CERT examples literally. Most of the CERT rules are explained in an informal natural language and it makes CERT great for demonstrating common mistakes that programmers make when they write in C/C++. Humans can easily understand the problem by reading CERT pages. But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

Well, if the member's allocation and the expression-value-tracking would not be hand-waving, such a simple checker could serve a lot. I see your point and you are right the rules are not made for tools. Our tool already knows most of the functionality what is needed for the STR rules. It is started as an experiment and it is working enough well, in my point of view. The member-value-tracking and the lack of standard function modeling of course affects this checker, but that issue affects every other checker as well. If we call them non-alpha, it could be even non-alpha.

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!".

Just for clarification, I made the entire MemoryBlockRegion stuff because I do not care how do you allocate a memory block (even a string is counted as that).

This way your check does indeed pass the tests, but it doesn't make sense both from the rule's perspective (the rule never said that it's wrong to strcpy() into a fixed-size buffer, it only said that you should anyhow guarantee that the storage is sufficient) and from the user's perspective (you won't be able to reduce the gap between false positives and false negatives enough for the check to be useful, even with the subsequent whack-a-mole of adding more heuristics, because what the check is checking is relatively orthogonal to the actual problem you're trying to solve).

There is not that much heuristic:

  • If you use the unsafe calls and before reading the string you manage to null-terminate it (inject '\0'), it would be a false positive.
  • If you have a blob of memory without considering the length of the stuff it will store, it is a true positive.

Please note that, truncation is fine:

To prevent such errors, either limit copies through truncation or, preferably, ensure that the destination is of sufficient size

~ from the rule's page

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

This checker indeed would require an additional taint analysis, but as we do not have a non-alpha variant, I am fine to call every non-null-terminated string reading an issue (as it is), whatever is the source. That is a generalization, yes, in the best possible manner what we have got today. Also I am not a big fan of relying on non-alpha stuff and I am sure I have defined my checker enough well to catch real issues.



Let me summarize:

  • There is a lot more to do, we have seen too many issues to call this a finished checker.
  • The problem is not the size, but the missing '\0' (which you can have multiple of at any point).
  • We heavily need to swap the value-tracking with NoteTags to make this useful.
clang/test/Analysis/cert/str31-c-fp-suppression.cpp
33–34 ↗(On Diff #232198)

Well, I think we could check the range info and catch if we have an infeasible case. For now on one of the paths it looks like the string is not null-terminated, where we warn. Of course we need better modeling, that is why I have stated out with a FIXME.

I totally agree with hand-waving "true positives", but this is very unlikely to happen, I just came out with that example to state out the CStringModelingChecker is not enough in its current shape.

51–53 ↗(On Diff #232198)

That is why I have asked whether we have a way to say "when the string is read". I like that heuristic for now, because any kind of not null-terminated string is the root of the evil for STR31-C.

In this case the attacker is the programmer who sends a possibly not null-terminated string around and the function which receives an arbitrary input would carry the issue.

Thanks for the example, I wanted to cover ranges, but I have forgotten it, because in the wild peoples do not check for bounds. Copy-pasted.

clang/test/Analysis/cert/str31-c-notes.cpp
23

No, it is not broken, sadly. I would say it is broken, but this checker tries to be smarter than grep. Truncation is fine for STR31-C.

That change in my mindset made this checker possible when I have dropped the idea of warn on function calls.

27–28

That is when the string is read. We could have a list of function calls which are non-problematic. For now I am fine with that note, because you do not allocate something without reading it before freeing it. It is just an arbitrary test for now.

Charusso edited the summary of this revision. (Show Details)Dec 6 2019, 4:15 PM
NoQ added a comment.Dec 9 2019, 5:56 PM

Ok, so, what should the checker warn on? What should be the intended state machine for this checker?

We basically have two possible categories of answers to this question:

  • "The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.
  • Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.
  • The problem is not the size, but the missing '\0' (which you can have multiple of at any point).

The caption of the CERT rule suggests that both of these are problematic.

In D70411#1769786, @NoQ wrote:

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!".

Just for clarification, I made the entire MemoryBlockRegion stuff because I do not care how do you allocate a memory block (even a string is counted as that).

Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.


The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated.

It does, because strcpy is guaranteed to null-terminate as long as it has enough storage capacity.

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

This checker indeed would require an additional taint analysis, but as we do not have a non-alpha variant, I am fine to call every non-null-terminated string reading an issue (as it is), whatever is the source. That is a generalization, yes, in the best possible manner what we have got today. Also I am not a big fan of relying on non-alpha stuff and I am sure I have defined my checker enough well to catch real issues.

Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

  • We heavily need to swap the value-tracking with NoteTags to make this useful.

Before i forget, i'm opposed to explaining problems as notes. We should emit a warning as soon as we've reached far enough on the execution path to conclude that there's a problem.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
850

Maybe 31c? I'm afraid a dot in package name would be confusing and people will actually try something like to enable/disable cert.str.31.

855

alpha.cert.str.

clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
23 ↗(On Diff #232671)

These strings are visible to humans, please write some proper English :)

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
51–53 ↗(On Diff #232198)

Note that function copy_and_test in my example would not be necessarily visible to you. It may be in another translation unit.

So in order to "cover ranges" you'll have to discard all cases where the length is completely unknown.

clang/test/Analysis/cert/str31-c-notes.cpp
23

Truncation is fine for STR31-C.

This isn't just truncation, this is buffer overflow, and it's definitely not fine if we want to "Guarantee that storage for strings has sufficient space for character data".

This is absolutely the right place to simply warn on every invocation of gets().

27–28

That is when the string is read.

That strikes me as an example of the Maslow's hammer. The static analyzer isn't great at guessing whether the function reads from the string or not. For a dynamic analysis it wouldn't have been a problem.

That said, maybe you could get away with discriminating between char * and const char *. If the string is passed in by const pointer, there's nothing you can do except read from it. If it's passed in by non-const pointer, you can conservatively assume that it isn't read.

Charusso updated this revision to Diff 233252.Dec 10 2019, 7:47 PM
Charusso marked 7 inline comments as done.
Charusso retitled this revision from [analyzer] CERT: StrChecker: 31.c to [analyzer] CERT: StrChecker: Implementing most of the STR31-C.
Charusso edited the summary of this revision. (Show Details)
  • Fix.
In D70411#1776460, @NoQ wrote:

Ok, so, what should the checker warn on? What should be the intended state machine for this checker?

We basically have two possible categories of answers to this question:

  • "The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.
  • Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.

If I am right you are thinking about warn on the unsafe function calls which is the first case. I did that on by default. The idea is that, in a security manner we cannot allow hand-waving fixes. To measure stuff we could introduce options which serves the second case. I have added an option to suppress that common null-termination-by-hand idiom, so that I do not recommend anything. The CERT rules are not recommendations so I think now the warning is fine.

Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.

I see, and you are right, but we are getting closer and closer to catch the most of.


The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated.

It does, because strcpy is guaranteed to null-terminate as long as it has enough storage capacity.

It does not, because the storage capacity is arbitrary. If we would be sure the copied stuff's length cannot be larger than the destination's arbitrary capacity, we would not warn. There are tests for that case.

Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

I think it is too far away, sadly. I like the idea because it would be the root of security checkers, but I am mostly interested in data-flow analysis.

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
51–53 ↗(On Diff #232198)

That unknowness idea is cool, I will leave that comment as not Done.

clang/test/Analysis/cert/str31-c-notes.cpp
23

I have found out we have another recommendation for that type of issues:
https://wiki.sei.cmu.edu/confluence/display/c/STR03-C.+Do+not+inadvertently+truncate+a+string

Here the truncation is a truncation, but in a security manner we cannot rely on feelings, so we warn from now.

27–28

Cool idea, thanks!

Charusso updated this revision to Diff 233903.EditedDec 13 2019, 6:42 PM
Charusso retitled this revision from [analyzer] CERT: StrChecker: Implementing most of the STR31-C to [analyzer] CERT: STR31-C.
  • Iterate over parameters rather arguments for searching string-reading.
  • Better notes of the argument's index.
  • Avoid C.getPredecessor(), use the error-node instead.
Charusso updated this revision to Diff 241752.Jan 31 2020, 9:17 AM
Charusso edited the summary of this revision. (Show Details)
  • 2020-ify the checker writing
  • Remove extra bullet-points of CERT checker documentation: we only need the checker's documentation, not the packages.
balazske added inline comments.
clang/docs/analyzer/checkers.rst
1935

There are already more checkers that can check for CERT related problems but not specially made for these. These checkers do not reside in this new cert group. And generally a checker does not check for specifically a CERT rule, instead for more of them or other things too, or more checkers can detect a single rule. (And the user can think that only these CERT rules are checkable that exist in this package, that is not true.) So I do not like the introduction of this new cert package. (The documentation of existing checkers lists if the checker is designed for a CERT rule.)

clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
22 ↗(On Diff #241752)

Are there already not other checkers that find security related bugs (the taint checker?)? Why do these not use a SecurityError? It is not bad to have a SecurityError but maybe there is a reason why was it not there already. If these categories are exclusive it is hard to find out what problem (probably already existing bug type in other checkers) belongs to what category (it can be for this checker UnixAPI or MemoryError too?).

Szelethus added inline comments.Mar 23 2020, 5:47 AM
clang/docs/analyzer/checkers.rst
1935

I disagree to some extent. I think it would be great to have a cert package that houses all checkers for each of the rules with the addition of checker aliases. Clang-tidy has something similar as well!

Charusso marked 5 inline comments as done.Mar 23 2020, 7:40 PM

Thanks for the feedback! Given that it will remain an alpha checker for a long time (~1 year), no one really should use it.

clang/docs/analyzer/checkers.rst
1935

I designed the checker only for the rule STR31-C that is why I have picked package cert. Clang-Tidy could aliasing checks. For example the check could be in the bugprone category aliased to cert and both could trigger the analysis.

the user can think that only these CERT rules are checkable

We only need to move security.FloatLoopCounter under the cert package. What else SEI CERT rules are finished off other than package cert? It is not my fault if someone could not package well or could not catch most of the issues of a SEI CERT rule or could not reach the SEI CERT group to note the fact the Analyzer catch a very tiny part of a rule. However, this patch package well and could catch most of the STR31-C rule.

clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
22 ↗(On Diff #241752)

We have only three pure security checkers: security.insecureAPI, security.FloatLoopCounter (FLP30-C, FLP30-CPP) and alpha.security.cert.pos.34c. The non-alpha checkers are very old, but Ted definitely made that category:

const char *bugType = "Floating point variable used as loop counter";
BR.EmitBasicReport(bugType, "Security", os.str().c_str(),
                   FS->getLocStart(), ranges.data(), ranges.size());

Every other security-ish checker does not catch the CERT rule's examples. May if the checker evolves I would pick MemoryError, but it will not evolve soon.

balazske added inline comments.Mar 24 2020, 10:04 AM
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
855

This text may be hard to understand. The option means "if it is off, the problem report happens only if there is no hand-written null termination of the string"? I looked at the CERT rule but did not find out why is null termination by hand important here. (I did not look at the code to find out what it does.) And "WarnOnCall" suggests that there is warning made on calls if on, or not at all or at other location if off, is this correct?

clang/lib/StaticAnalyzer/Checkers/AllocationState.h
28

This documentation could be improved or left out.

clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗(On Diff #241752)

Maybe I have wrong expectations about what this checker does but did understand it correctly yet. The main problem here is that dest may be not large enough (if size of src is not known). This would be a better text for the error message? And if this happens (the write outside) it is already a fatal error, can cause crash. If no buffer overflow happens the terminating zero is copied from src, so why would dst be not null terminated?

Charusso marked 4 inline comments as done.Mar 24 2020, 11:45 AM
Charusso added inline comments.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
855

You let the buffer overflow by a bugprone function call, then you adjust the null-terminator by simply buf[size] = '\0', then you made sure the buffer cannot overflow, since you have terminated it. It is a very common idiom therefore I do not think a hand-written null-termination is a security issue. The SEI CERT rules are all theoretical so you will not find anything useful in practice. My solution is practical.

This text may be hard to understand.

Please note that this text only made for Static Analyzer developers. Let us rephrase it then:

Whether the checker needs to warn on the bugprone function calls immediately or look for bugprone hand-written null-termination of bugprone function call made strings. It is a common idiom to null-terminate the string by hand after the insecure function call produce the string which could be misused so that it is on by default. It is useful to turn it off to reduce the noise of the checker, because people usually null-terminate the string by hand immediately after the bugprone function call produce the string.

Do you buy that?

clang/lib/StaticAnalyzer/Checkers/AllocationState.h
28

Totally, I have already forgotten I wanted to remove it. Thanks!

clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗(On Diff #241752)

I like that error message because that is the truth, that is the issue. If it would crash so the operating system would detect a fatal error we should not develop Static Analyzer and vulnerabilities would not exist. That would be cool, btw.

balazske added inline comments.Mar 25 2020, 1:58 AM
clang/docs/analyzer/checkers.rst
1935

So I can move this checker: D71510 into alpha.security.cert.err.33c?

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
855

First sentence is OK, the later part relatively better. Probably the explanations about why it is good to turn it on or off can be left out, these are misunderstandable. (If the manual null-termination is done, in one case it is a "common idiom but can be misused so make warning" on other case "these warnings generates only extra noise".)
It can be something like this (if correct):

Whether the checker wars on the function calls immediately or warns on bugprone hand-written null-termination of function call made strings. Default value is on because null-terminating the string by hand after the insecure function call could be misused. But it is a common idiom to manually null-terminate the strings immediately after the function calls so it could be turned off to reduce noise of the checker.

I do not like the word "bugprone", is it correct english? ("Problematic" or "insecure" could be used instead.)

clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗(On Diff #241752)

There must be some misunderstanding here, I still do not know what this test does. Can probably somebody other explain it? After the first strcpy the src content is copied into dest. If src is not a null-terminated string this is itself a bug and strcpy is undefined behavior. If src has strlen(src) > 127 it is a bug again because the copy can write outside of dest past its end, into the stack frame of the function. After this happens, writing a null terminator does not correct the other overwritten data that still can cause crash or other thing. (This strcpy call is the place for a warning. Probably not a fatal error because it is not sure, if size of src is not known.) Otherwise result of strcpy is a null-terminated string in dest. The next *dest = '\0' causes that dest points to an empty string, why is this needed here (it was already null terminated)? The following strcpy overwrites it anyway.

Charusso marked 6 inline comments as done.Mar 25 2020, 12:35 PM

"To prevent such errors, either limit copies through truncation or, preferably, ensure that the destination is of sufficient size to hold the character data" - from the rule's page.
Most of the projects are fine truncating by hand because the write happens in somewhat well-bounded strings: IP-addresses, names, numbers... I wanted to make this as practical as possible. Until you are having a null-terminated string without being read, you are most likely fine. Feel free to try this out, probably you would already understand the WarnOnCall option very well.

clang/docs/analyzer/checkers.rst
1935

Well, not exactly. No one should care about alpha checkers except advanced Static Analyzer developers like you do right now. Since alpha is a dead-zone you can put your work anywhere. The core issue was Ted's placing of CERT rules.

I have written the first CERT-checker after Ted's, which introduced such packaging, then it became dead, so I have lost interest to make the package-aliasing a thing. If you wish to move to cert, feel free, but please note that, your first idea of bad-packaging is right. We really need the functionality of package-aliasing but I have ran out of time and the project died as well.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
855

bugprone comes from the Clang-Tidy world and we use it everywhere, but let us change it to insecure then. I will copy-paste it. Thanks!

clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗(On Diff #241752)

Well, I did not make any comments about what is going on, because it is far away of being finished, it is an alpha checker. Until we have not designed it, it is useless to write documentation, but feel free to read the discussion of the checker's evolution, so that I do not need to repeat myself. Once we have designed it, sure, I will document the design.

Let us first clear your misconceptions:

After the first strcpy the src content is copied into dest. If src is not a null-terminated string this is itself a bug and strcpy is undefined behavior.

Cool, but this is another rule namely STR32-C: D71033. Here we do not care about that issue. Each test should be isolated and test the behavior what we expect. We do not care about another patch/rule here.

If src has strlen(src) > 127 it is a bug again because the copy can write outside of dest past its end, into the stack frame of the function.

It is a bug in a security point of view. However, in the wild, no one really cares about it, and null-terminate the string by hand. That is why this is the test file when WarnOnCall=false to reduce the noise of the checker if someone is fine with such concept, but still want to catch non-null-terminated strings. So that this checker servers the needs of "enough space for holding the string" and "null-terminated strings" given by the assumption src is already null-terminated or dest is null-terminated by hand before the string is possibly read. Most of the C projects have a constant global array which could hold 1024 characters and they do not care if overflow happens, but still care if null-termination happens so truncate the string by hand.

After this happens, writing a null terminator does not correct the other overwritten data that still can cause crash or other thing.

Tell to the C open source project (like VIM) writers they are dumb. I still want to make sure they could benefit from this checker in a non-security point of view.

(This strcpy call is the place for a warning. Probably not a fatal error because it is not sure, if size of src is not known.)

Nope, D71033.

Otherwise result of strcpy is a null-terminated string in dest.

Nope, in a non-security point of view there is no issue, because the normal behavior of the program to read an IP-address or something bounded and just a truncation needed for the enter/space characters to the end. This is a normal C-programmer idiom and on projects where no-one cares about security, like VIM.

The next *dest = '\0' causes that dest points to an empty string,

Yes.

why is this needed here (it was already null terminated)? The following strcpy overwrites it anyway.

*dest = '\0' demonstrates the usual C-programmer behavior. They somehow null-terminate the string then everything is alright, no buffer-overflow read could happen. However, on the second time the null-termination may not happen since the dest buffer moves out of scope and possibly read by do_something(). If it is read in a non-null-terminated way, that is a problem in their world.

Probably you get it now why such option introduced. Feel free to try this out. The thing you will see, we cannot point out interesting properties of the bug, such as the destination buffer, so that we cannot give great bug reports, so that the checker remain alpha and useless.

Charusso updated this revision to Diff 254061.Mar 31 2020, 5:50 PM
Charusso marked 6 inline comments as done.
Charusso edited the summary of this revision. (Show Details)
  • Get rid of the secondary behavior for now.
  • Fix review comments.

Given that the secondary behavior confuse people I have removed it for now. May if someone introduce a NullTerminationChecker then we introduce such option to warn on insecure calls instant. Thanks @balazske for influencing that change. @NoQ this project had a deadline like half year ago, could you smash that green button please?

balazske added inline comments.Apr 1 2020, 1:29 AM
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
55

The functions gets, strcpy return a null-terminated string according to standard, probably sprintf too, and the fscanf %s output is null-terminated too even if no length is specified (maybe it writes outside of the specified buffer but null terminated).

clang/test/Analysis/cert/str31-c.cpp
117

Do we get a warning if the +1 above is missing?

Charusso updated this revision to Diff 254420.Apr 1 2020, 10:05 PM
Charusso marked 4 inline comments as done.
  • Simplify tests.
  • Remove dead code, they are far away to being used.
  • Add an extra test case.

Thanks for the review, hopefully if I ping @NoQ in every round, it will be green-marked soon.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
55

From where do you observe such information? I have not seen anything strcpy()-specific in the standard. My observation clearly says that, if there is not enough space for the data you *could* even write to overlapping memory, or as people says, you could meet nasal demons. It is called *undefined* behavior.

clang/test/Analysis/cert/str31-c.cpp
117

Test added.

Charusso updated this revision to Diff 254423.Apr 1 2020, 10:09 PM
  • Remove the last dead comment.
Charusso updated this revision to Diff 265959.Sun, May 24, 9:19 PM
Charusso retitled this revision from [analyzer] CERT: STR31-C to [analyzer] CERT STR rule checkers: STR31-C.
  • Refactor.