This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] CERT STR rule checkers: STR31-C
Needs ReviewPublic

Authored by Charusso on Nov 18 2019, 3:07 PM.

Details

Summary

This patch introduces a new experimental checker:
alpha.security.cert.str.31c

This checker is implemented based on the following rule:
STR31-C: Guarantee that storage for strings has sufficient space for
character data and the null terminator:

It warns on misusing the following functions:
strcpy(), gets(), fscanf(), sprintf().

Diff Detail

Event Timeline

Charusso created this revision.Nov 18 2019, 3:07 PM
Charusso marked an inline comment as done.EditedNov 18 2019, 3:16 PM

Somehow the MallocBugVisitor is broken in the wild - it does not catch the allocation, and we need to support a lot more to call it non-alpha (see the FIXMEs). I do not want to spend time on fixing the visitor, but rather I would make this checker alpha, and move on. After a while when we have enough alarms we could see what could go wrong. The reports are not bad, I tried to make it super false-positive free.

@aaron.ballman the name became security.cert.str.31.c and we lack the support to invoke every C checker with str.*.c (as I know), but for now it should be fine.

clang/include/clang/StaticAnalyzer/Core/PathSensitive/DynamicSizeInfo.h
18

That change is totally unrelated.

NoQ added inline comments.Nov 19 2019, 3:58 PM
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
280

Why is this a false positive?

You're bringing in a completely brand-new machinery here, could you explain how it works and why do you need it?

Somehow the MallocBugVisitor is broken in the wild - it does not catch the allocation, and we need to support a lot more to call it non-alpha (see the FIXMEs). I do not want to spend time on fixing the visitor, but rather I would make this checker alpha, and move on. After a while when we have enough alarms we could see what could go wrong. The reports are not bad, I tried to make it super false-positive free.

@aaron.ballman the name became security.cert.str.31.c and we lack the support to invoke every C checker with str.*.c (as I know), but for now it should be fine.

The name seems reasonable enough to me -- it at least has all the relevant information in it, which is a great start.

NoQ added a comment.Nov 20 2019, 5:01 PM

I think it would really help if you draw a state machine for the checker, like the ASCII-art thing in D70470; you don't need to spend a lot of time turning it into ASCII-art, a photo of a quick hand-drawn picture would be totally fine, because it's, first and foremost, for discussion :)

Charusso updated this revision to Diff 230354.Nov 20 2019, 5:06 PM
Charusso marked 2 inline comments as done.
Charusso retitled this revision from [analyzer][WIP] StrChecker: 31.c to [analyzer][WIP] CERT: StrChecker: 31.c.
  • Added a report when the not null-terminated string is read.
  • Added comments.
  • Fixed some uninteresting nits.
  • No ASCII-art yet.
Charusso added a comment.EditedNov 20 2019, 5:14 PM
In D70411#1754356, @NoQ wrote:

I think it would really help if you draw a state machine for the checker, like the ASCII-art thing in D70470; you don't need to spend a lot of time turning it into ASCII-art, a photo of a quick hand-drawn picture would be totally fine, because it's, first and foremost, for discussion :)

Hm, the idea is cool, but my checker is not that complex, given that now I have added comments. Thanks, I will adjust the comments with some kind of drawing.

My main problem was that to create a NoteTag when the not null-terminated string is read, but it is after when we emit an error, so I could not emit a note. That is why it emits two different reports, and somehow I need to convert the function call evaluation warning to a note in case when a not null-terminated string is read. Do we have any plans with NoteTags to support the craziest checkers? What do you think about storing the reports in the program state?

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
280

Hm, yes, I wanted to add comments earlier, sorry. It is still wonky a little-bit, somehow I need to merge the two different errors.

Charusso added inline comments.Nov 20 2019, 5:21 PM
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
171

We can do the opposite to see whether the destination array's type is an array, so that we do not need getDynamicSizeExpr, yay.

Charusso updated this revision to Diff 230702.Nov 22 2019, 11:55 AM
Charusso marked an inline comment as done.
Charusso retitled this revision from [analyzer][WIP] CERT: StrChecker: 31.c to [analyzer] CERT: StrChecker: 31.c.
  • Remove the report storing map so we do not traverse backwards on the bug-path.
  • Use NoteTags instead of reports on problematic function calls.
  • Emit a report only if a not null-terminated string is read.
  • Store whether the string is null-terminated.

Now I have simplified the checker so we do not need any ASCII-art, I believe. Do we have any better logic than the current implementation to catch when the string is read?

Charusso updated this revision to Diff 231341.EditedNov 27 2019, 5:56 PM
  • Do not emit a report on re-binding a not null-terminated string, it may will be properly terminated (test added).
  • Catch the null-termination on symbolic level in checkBind().
  • Added a test for the necessity of SValVisitor in this checker.
  • Some refactor.
Charusso marked 2 inline comments as done.Nov 27 2019, 6:22 PM

The false positive suppression by going backwards on the bug-path of stored reports was a very simple concept, which turned out to be a useless one. The rough idea was that to invalidate every report in a path, and every other report would be left because they are reachable from a different path and they are valid errors. So that:

void test_wonky_null_termination(const char *src) {
  char dest[128];
  strcpy(dest, src);
  dest[sizeof(dest) - 1] = '\0';
  do_something(dest); // no-warning
}

The above example null-terminates the string on every path, so we remove the report, and:

void test_may_not_every_path_null_terminated(const char *src) {
  char dest[128];
  strcpy(dest, src);
  if (strlen(src) > sizeof(dest)) {
    dest[sizeof(dest) - 1] = '\0';
  }
  do_something(dest);
  // expected-warning@-1 {{'dest' is not null-terminated}}
}

We say that the above example has a path where we cannot invalidate the report. Invalidating one single control-flow path I think should be safe and keeps every other report like the above examples. In general there is no such simple case exist because we check for the destination array being null so we encounter a state-split whatever we do. Also may it was a bad idea to rewrite the BugReporter for a simple checker.

This null-termination-by-hand made false positives because even the buffer could overflow, we stop up the flow, so the Vasa will not sink. I wanted to emit reports on problematic function calls, but changing the main logic to emit an error on the string-reading made the null-termination store-able. So that we could drop this backward path-traversing logic, that is why I have not answered yet. However, if we still want to emit an error on function-calls, and every of them, we need to be able to chain the calls to the same region, which made by the reports in the previous revisions. Also may we would use this report-storing idea later.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
801

In my mind the documentation is the CERT rule's page. Is it okay?

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
15

Facepalm.

Charusso updated this revision to Diff 232198.Dec 4 2019, 1:52 PM
  • Tweaking.
Charusso added a comment.EditedDec 4 2019, 1:57 PM

I have picked curl as an example, because it has a measurable amount of reports. I still recommend to make this alpha, because the expression-tracking cannot track members, and peoples are using members everywhere.

  • It is made with CodeChecker:
--analyzers clangsa \
--disable default \
--enable core \
--enable unix \
--enable security.cert.str.StrCheckerBase \
--enable security.cert.str.31.c \
--saargs sa_args.txt
$ cat sa_args.txt

-Xclang -analyzer-config -Xclang silence-checkers="core;unix"
NoQ added a comment.Dec 4 2019, 3:31 PM

A while back I had a look into implementing some of the CERT checkers and my vague recollection from these attempts was that it's a bad idea to take CERT examples literally. Most of the CERT rules are explained in an informal natural language and it makes CERT great for demonstrating common mistakes that programmers make when they write in C/C++. Humans can easily understand the problem by reading CERT pages. But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!". This way your check does indeed pass the tests, but it doesn't make sense both from the rule's perspective (the rule never said that it's wrong to strcpy() into a fixed-size buffer, it only said that you should anyhow guarantee that the storage is sufficient) and from the user's perspective (you won't be able to reduce the gap between false positives and false negatives enough for the check to be useful, even with the subsequent whack-a-mole of adding more heuristics, because what the check is checking is relatively orthogonal to the actual problem you're trying to solve).

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
801

I'd prefer our own documentation that may link to the CERT page but should also explain, say, which parts of the rule are covered by the checker (even if it's "all of them", they may add more rules in the future), and probably some specifics of the checker's behavior if they aren't obvious from the rule.

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
34–35

Mmm, no, it's not fine. The warning is saying something that's not correct. Even if src is not null-terminated, under the assumption that no out-of-bounds read UB occurs in the strcpy() call, dest is going to always be null-terminated and have a strlen of at most 127. And by continuing our analysis after strcpy(), we inform the user that we assume that no UB has happened on the current execution path yet.

Traditionally checkers either warn immediately when they can detect an error, or assume that the error has not happened. For example, when we dereference a pointer, if the pointer is not known to be null, we actively assume that it's non-null. Similarly, if the src string is not null-terminated, we should warn at the invocation of strcpy; if it's not known whether it's null-terminated or not, we should actively assume that it's null-terminated at strcpy.

Also, I usually don't agree with the statements like "This report helped us find a real bug, therefore it's a true positive". You can find a bug in literally any code if you stare at it long enough! I'm glad you found the bug anyway, but if the report is not pointing at the problem directly, we're not doing a great job.

52–54

Looks like a false positive. Consider the following perfectly correct (even if a bit inefficient) code that only differs from the current code only in names and context:

void copy_and_test_if_short(const char *src) {
  char dest[128];
  strcpy(dest, src);
  warn_if_starts_with_foo(dest);
}

void copy_and_test(const char *src) {
  if (strlen(src) < 64)
    copy_and_test_if_short(src);
  else
    copy_and_test_if_long(src);
}
clang/test/Analysis/cert/str31-c-notes.cpp
23

I still believe that this should be *the* warning in this code. This is already broken code.

27–28

It is not wrong to free a buffer that isn't null-terminated. We shouldn't warn here.

NoQ added a comment.Dec 4 2019, 3:45 PM

These reports seem to confirm my point. I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways. You most likely won't be able to enumerate all possible ways in which the user can ensure that the destination buffer size is sufficient.


The source string is taken from a global table and therefore has known maximum size.


The source string is an IP address and port, which has a known limit on the number of digits it can have.


The source string is a literal '*'.


There's an if-statement that checks that the storage size is sufficient.


The source string is a token (obtained via strtok) of a string that has the same size as the destination buffer.

Charusso updated this revision to Diff 232671.Dec 6 2019, 4:10 PM
Charusso marked 10 inline comments as done.
  • Make the checker alpha.
  • Add docs.
  • Copy-paste @NoQ's test.

Please let us measure your examples at first to have a consensus what we would like to see from this checker.

In D70411#1769803, @NoQ wrote:

These reports seem to confirm my point. I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways. You most likely won't be able to enumerate all possible ways in which the user can ensure that the destination buffer size is sufficient.

I believe I can enhance the checker with the help of NoteTags and give more information for the user why the buffer space is Not sufficient. The size property's upper-bound is not that difficult to measure, I think we can implement this. For now, let me analyze each report.



The source string is taken from a global table and therefore has known maximum size.

The source string is a literal '*'.

When you encounter a member that is when the hand-waving begins. I have made a dump to see what we have. I have not seen any immediate way to connect the dynamic size and the members. For now we have that:
The source region is:
SymRegion{reg_$36<const char * SymRegion{derived_$35{conj_$32{int, LC7, S161233, #1},Element{encodings,0 S64b,const struct content_encoding_s *}}}.name>}
and the allocation's dynamic size is:
extent_$41{SymRegion{conj_$34{void *, LC7, S161233, #1}}}

My assumption is that, the size of a member is very arbitrary and we cannot track the member's allocation, so we even could drop every of these reports as being useless. Because I make this checker alpha I am fine with that assumption of members, and we definitely need better ideas.



The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated. For example the VIM project has one single global macro to serve every allocation's size, and that is why the CERT rule made. If the attacker passes a 10k characters long string and you allocate 10k+1 characters for the '\0' you can work with the data safely. Of course you would need to check some upper bound, but that is another story. The key is, any arbitrary buffer size could be a problem. Here:

954 | size_t addrlen = INET6_ADDRSTRLEN > strlen(string_ftpport) // Assuming the condition is true
                       ? INET6_ADDRSTRLEN
                       : strlen(string_ftpport)

addrlen is an arbitrary macro size INET6_ADDRSTRLEN, which not necessarily could hold string_ftpport in strcpy(addr, string_ftpport);. As a human-being you know the most appropriate size, yes. But the program could be ill-formed very easily so we warn on the arbitrary-length path. Here you do not see something like addr[INET6_ADDRSTRLEN - 1] = '\0' so when the string is read you cannot be sure whether it is null-terminated. If line 954 would be evaluated to false, allocating calloc(addrlen + 1, 1) so when addrlen is strlen(string_ftpport) is fine of course, that is the appropriate size to store string_ftpport.



There's an if-statement that checks that the storage size is sufficient.

Well, this is the most suspicious report of all the time. Let me reconstruct what is going on:

513 | sbytes += tftp_option_add(state, sbytes,
                                (char *)state->spacket.data + sbytes, // warning
                                TFTP_OPTION_TSIZE);

My guess was that to change the check::PostCall to check::PreCall, but if I remember right the same report occurred. Here we return from the function call tftp_option_add, then we warn on its argument which already has been passed to the call. We should not warn when the array is passed to a function which we already have finished reading from. I have not seen any immediate solution, so I count this as an internal bug feature.



The source string is a token (obtained via strtok) of a string that has the same size as the destination buffer.

Let me simplify the report as there is no such feature:

111 | outlen = strlen(outfile);
112 | outdup = strdup(outfile);
116 | dirbuildup = malloc(outlen + 1);

125 | tempdir = strtok(outdup, PATH_DELIMITERS);
136 | if(outdup == tempdir)
138 |   strcpy(dirbuildup, tempdir);

Here we see both tempdir's and dirbuildup's size is based on strlen(outfile) but strtok() strikes in and we conjure a new symbol for dirbuildup so the connection is lost unfortunately.



Please let us brainstorming here.

In D70411#1769786, @NoQ wrote:

A while back I had a look into implementing some of the CERT checkers and my vague recollection from these attempts was that it's a bad idea to take CERT examples literally. Most of the CERT rules are explained in an informal natural language and it makes CERT great for demonstrating common mistakes that programmers make when they write in C/C++. Humans can easily understand the problem by reading CERT pages. But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

Well, if the member's allocation and the expression-value-tracking would not be hand-waving, such a simple checker could serve a lot. I see your point and you are right the rules are not made for tools. Our tool already knows most of the functionality what is needed for the STR rules. It is started as an experiment and it is working enough well, in my point of view. The member-value-tracking and the lack of standard function modeling of course affects this checker, but that issue affects every other checker as well. If we call them non-alpha, it could be even non-alpha.

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!".

Just for clarification, I made the entire MemoryBlockRegion stuff because I do not care how do you allocate a memory block (even a string is counted as that).

This way your check does indeed pass the tests, but it doesn't make sense both from the rule's perspective (the rule never said that it's wrong to strcpy() into a fixed-size buffer, it only said that you should anyhow guarantee that the storage is sufficient) and from the user's perspective (you won't be able to reduce the gap between false positives and false negatives enough for the check to be useful, even with the subsequent whack-a-mole of adding more heuristics, because what the check is checking is relatively orthogonal to the actual problem you're trying to solve).

There is not that much heuristic:

  • If you use the unsafe calls and before reading the string you manage to null-terminate it (inject '\0'), it would be a false positive.
  • If you have a blob of memory without considering the length of the stuff it will store, it is a true positive.

Please note that, truncation is fine:

To prevent such errors, either limit copies through truncation or, preferably, ensure that the destination is of sufficient size

~ from the rule's page

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

This checker indeed would require an additional taint analysis, but as we do not have a non-alpha variant, I am fine to call every non-null-terminated string reading an issue (as it is), whatever is the source. That is a generalization, yes, in the best possible manner what we have got today. Also I am not a big fan of relying on non-alpha stuff and I am sure I have defined my checker enough well to catch real issues.



Let me summarize:

  • There is a lot more to do, we have seen too many issues to call this a finished checker.
  • The problem is not the size, but the missing '\0' (which you can have multiple of at any point).
  • We heavily need to swap the value-tracking with NoteTags to make this useful.
clang/test/Analysis/cert/str31-c-fp-suppression.cpp
34–35

Well, I think we could check the range info and catch if we have an infeasible case. For now on one of the paths it looks like the string is not null-terminated, where we warn. Of course we need better modeling, that is why I have stated out with a FIXME.

I totally agree with hand-waving "true positives", but this is very unlikely to happen, I just came out with that example to state out the CStringModelingChecker is not enough in its current shape.

52–54

That is why I have asked whether we have a way to say "when the string is read". I like that heuristic for now, because any kind of not null-terminated string is the root of the evil for STR31-C.

In this case the attacker is the programmer who sends a possibly not null-terminated string around and the function which receives an arbitrary input would carry the issue.

Thanks for the example, I wanted to cover ranges, but I have forgotten it, because in the wild peoples do not check for bounds. Copy-pasted.

clang/test/Analysis/cert/str31-c-notes.cpp
23

No, it is not broken, sadly. I would say it is broken, but this checker tries to be smarter than grep. Truncation is fine for STR31-C.

That change in my mindset made this checker possible when I have dropped the idea of warn on function calls.

27–28

That is when the string is read. We could have a list of function calls which are non-problematic. For now I am fine with that note, because you do not allocate something without reading it before freeing it. It is just an arbitrary test for now.

Charusso edited the summary of this revision. (Show Details)Dec 6 2019, 4:15 PM
NoQ added a comment.Dec 9 2019, 5:56 PM

Ok, so, what should the checker warn on? What should be the intended state machine for this checker?

We basically have two possible categories of answers to this question:

  • "The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.
  • Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.
  • The problem is not the size, but the missing '\0' (which you can have multiple of at any point).

The caption of the CERT rule suggests that both of these are problematic.

In D70411#1769786, @NoQ wrote:

So i suggest that we make one more step back and agree upon what exactly are we checking here. I.e., basically, agree on the tests. Because right now it looks like you're trying to blindly generalize over the examples: "The CERT example has a strcpy() into a fixed-size buffer, let's warn on all strcpy()s into fixed-size buffers!".

Just for clarification, I made the entire MemoryBlockRegion stuff because I do not care how do you allocate a memory block (even a string is counted as that).

Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.


The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated.

It does, because strcpy is guaranteed to null-terminate as long as it has enough storage capacity.

For example, in the example titled "Noncompliant Code Example (argv)", it is explicitly stated that the problem is not only that the buffer is fixed-size and strcpy() is made, but also that the original string is controlled by the attacker. The right analysis to catch such issues is taint analysis. It's a typical taint problem. I honestly believe you should try to solve it by combining the taint checker with the string checker: "if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn". With the recent advancements in the development of the taint checker, i think you can get pretty far this way without constantly struggling with false positives.

This checker indeed would require an additional taint analysis, but as we do not have a non-alpha variant, I am fine to call every non-null-terminated string reading an issue (as it is), whatever is the source. That is a generalization, yes, in the best possible manner what we have got today. Also I am not a big fan of relying on non-alpha stuff and I am sure I have defined my checker enough well to catch real issues.

Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

  • We heavily need to swap the value-tracking with NoteTags to make this useful.

Before i forget, i'm opposed to explaining problems as notes. We should emit a warning as soon as we've reached far enough on the execution path to conclude that there's a problem.

clang/test/Analysis/cert/str31-c-notes.cpp
23

Truncation is fine for STR31-C.

This isn't just truncation, this is buffer overflow, and it's definitely not fine if we want to "Guarantee that storage for strings has sufficient space for character data".

This is absolutely the right place to simply warn on every invocation of gets().

27–28

That is when the string is read.

That strikes me as an example of the Maslow's hammer. The static analyzer isn't great at guessing whether the function reads from the string or not. For a dynamic analysis it wouldn't have been a problem.

That said, maybe you could get away with discriminating between char * and const char *. If the string is passed in by const pointer, there's nothing you can do except read from it. If it's passed in by non-const pointer, you can conservatively assume that it isn't read.

NoQ added inline comments.Dec 9 2019, 5:56 PM
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
798

Maybe 31c? I'm afraid a dot in package name would be confusing and people will actually try something like to enable/disable cert.str.31.

803

alpha.cert.str.

clang/lib/StaticAnalyzer/Core/CommonBugCategories.cpp
23

These strings are visible to humans, please write some proper English :)

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
52–54

Note that function copy_and_test in my example would not be necessarily visible to you. It may be in another translation unit.

So in order to "cover ranges" you'll have to discard all cases where the length is completely unknown.

Charusso updated this revision to Diff 233252.Dec 10 2019, 7:47 PM
Charusso marked 7 inline comments as done.
Charusso retitled this revision from [analyzer] CERT: StrChecker: 31.c to [analyzer] CERT: StrChecker: Implementing most of the STR31-C.
Charusso edited the summary of this revision. (Show Details)
  • Fix.
In D70411#1776460, @NoQ wrote:

Ok, so, what should the checker warn on? What should be the intended state machine for this checker?

We basically have two possible categories of answers to this question:

  • "The checker warns every time it finds an execution path on which unintended behavior occurs" - describe unintended behavior (say, "out-of-bounds buffer access") and define the checker in such a way that all warnings describe actual bugs of this kind (in our example, out-of-bound buffer accesses) as long as the rest of the static analyzer works correctly.
  • Or, you could say "The checker enforces a certain coding guideline on the user" - and then you should describe the guideline in a very short and easy-to-understand manner (say, "don't pass the string by pointer anywhere before you null-terminate it"), and then make checker check exactly this guideline. You may introduce some suppressions for intentional violations of the guideline, but the guideline itself should be simple, it should be always possible to follow it, and it should sound nice enough for developers to feel good about themselves when they follow it.

If I am right you are thinking about warn on the unsafe function calls which is the first case. I did that on by default. The idea is that, in a security manner we cannot allow hand-waving fixes. To measure stuff we could introduce options which serves the second case. I have added an option to suppress that common null-termination-by-hand idiom, so that I do not recommend anything. The CERT rules are not recommendations so I think now the warning is fine.

Yup, i mean, you can go as far as you want with generalizing over "fixed size buffer" in this approach, but what i'm saying is that you're still not addressing the whole train of thought about the length of the source string.

I see, and you are right, but we are getting closer and closer to catch the most of.


The source string is an IP address and port, which has a known limit on the number of digits it can have.

The known size does not mean that the string is going to be null-terminated.

It does, because strcpy is guaranteed to null-terminate as long as it has enough storage capacity.

It does not, because the storage capacity is arbitrary. If we would be sure the copied stuff's length cannot be larger than the destination's arbitrary capacity, we would not warn. There are tests for that case.

Checks that are part of the generic taint checker are currently in a pretty bad shape, but the taint analysis itself is pretty good, and checks that rely on taint but aren't part of the generic taint checker are also pretty good. I actually believe taint analysis could be enabled by default as soon as somebody goes through the broken checks and disables/removes them. If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

I think it is too far away, sadly. I like the idea because it would be the root of security checkers, but I am mostly interested in data-flow analysis.

clang/test/Analysis/cert/str31-c-fp-suppression.cpp
52–54

That unknowness idea is cool, I will leave that comment as not Done.

clang/test/Analysis/cert/str31-c-notes.cpp
23

I have found out we have another recommendation for that type of issues:
https://wiki.sei.cmu.edu/confluence/display/c/STR03-C.+Do+not+inadvertently+truncate+a+string

Here the truncation is a truncation, but in a security manner we cannot rely on feelings, so we warn from now.

27–28

Cool idea, thanks!

Charusso updated this revision to Diff 233903.EditedDec 13 2019, 6:42 PM
Charusso retitled this revision from [analyzer] CERT: StrChecker: Implementing most of the STR31-C to [analyzer] CERT: STR31-C.
  • Iterate over parameters rather arguments for searching string-reading.
  • Better notes of the argument's index.
  • Avoid C.getPredecessor(), use the error-node instead.
Charusso updated this revision to Diff 241752.Jan 31 2020, 9:17 AM
Charusso edited the summary of this revision. (Show Details)
  • 2020-ify the checker writing
  • Remove extra bullet-points of CERT checker documentation: we only need the checker's documentation, not the packages.
balazske added inline comments.
clang/docs/analyzer/checkers.rst
1935 ↗(On Diff #241752)

There are already more checkers that can check for CERT related problems but not specially made for these. These checkers do not reside in this new cert group. And generally a checker does not check for specifically a CERT rule, instead for more of them or other things too, or more checkers can detect a single rule. (And the user can think that only these CERT rules are checkable that exist in this package, that is not true.) So I do not like the introduction of this new cert package. (The documentation of existing checkers lists if the checker is designed for a CERT rule.)

clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
22

Are there already not other checkers that find security related bugs (the taint checker?)? Why do these not use a SecurityError? It is not bad to have a SecurityError but maybe there is a reason why was it not there already. If these categories are exclusive it is hard to find out what problem (probably already existing bug type in other checkers) belongs to what category (it can be for this checker UnixAPI or MemoryError too?).

Szelethus added inline comments.Mar 23 2020, 5:47 AM
clang/docs/analyzer/checkers.rst
1935 ↗(On Diff #241752)

I disagree to some extent. I think it would be great to have a cert package that houses all checkers for each of the rules with the addition of checker aliases. Clang-tidy has something similar as well!

Charusso marked 5 inline comments as done.Mar 23 2020, 7:40 PM

Thanks for the feedback! Given that it will remain an alpha checker for a long time (~1 year), no one really should use it.

clang/docs/analyzer/checkers.rst
1935 ↗(On Diff #241752)

I designed the checker only for the rule STR31-C that is why I have picked package cert. Clang-Tidy could aliasing checks. For example the check could be in the bugprone category aliased to cert and both could trigger the analysis.

the user can think that only these CERT rules are checkable

We only need to move security.FloatLoopCounter under the cert package. What else SEI CERT rules are finished off other than package cert? It is not my fault if someone could not package well or could not catch most of the issues of a SEI CERT rule or could not reach the SEI CERT group to note the fact the Analyzer catch a very tiny part of a rule. However, this patch package well and could catch most of the STR31-C rule.

clang/include/clang/StaticAnalyzer/Core/BugReporter/CommonBugCategories.h
22

We have only three pure security checkers: security.insecureAPI, security.FloatLoopCounter (FLP30-C, FLP30-CPP) and alpha.security.cert.pos.34c. The non-alpha checkers are very old, but Ted definitely made that category:

const char *bugType = "Floating point variable used as loop counter";
BR.EmitBasicReport(bugType, "Security", os.str().c_str(),
                   FS->getLocStart(), ranges.data(), ranges.size());

Every other security-ish checker does not catch the CERT rule's examples. May if the checker evolves I would pick MemoryError, but it will not evolve soon.

balazske added inline comments.Mar 24 2020, 10:04 AM
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
803

This text may be hard to understand. The option means "if it is off, the problem report happens only if there is no hand-written null termination of the string"? I looked at the CERT rule but did not find out why is null termination by hand important here. (I did not look at the code to find out what it does.) And "WarnOnCall" suggests that there is warning made on calls if on, or not at all or at other location if off, is this correct?

clang/lib/StaticAnalyzer/Checkers/AllocationState.h
28

This documentation could be improved or left out.

clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗(On Diff #241752)

Maybe I have wrong expectations about what this checker does but did understand it correctly yet. The main problem here is that dest may be not large enough (if size of src is not known). This would be a better text for the error message? And if this happens (the write outside) it is already a fatal error, can cause crash. If no buffer overflow happens the terminating zero is copied from src, so why would dst be not null terminated?

Charusso marked 4 inline comments as done.Mar 24 2020, 11:45 AM
Charusso added inline comments.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
803

You let the buffer overflow by a bugprone function call, then you adjust the null-terminator by simply buf[size] = '\0', then you made sure the buffer cannot overflow, since you have terminated it. It is a very common idiom therefore I do not think a hand-written null-termination is a security issue. The SEI CERT rules are all theoretical so you will not find anything useful in practice. My solution is practical.

This text may be hard to understand.

Please note that this text only made for Static Analyzer developers. Let us rephrase it then:

Whether the checker needs to warn on the bugprone function calls immediately or look for bugprone hand-written null-termination of bugprone function call made strings. It is a common idiom to null-terminate the string by hand after the insecure function call produce the string which could be misused so that it is on by default. It is useful to turn it off to reduce the noise of the checker, because people usually null-terminate the string by hand immediately after the bugprone function call produce the string.

Do you buy that?

clang/lib/StaticAnalyzer/Checkers/AllocationState.h
28

Totally, I have already forgotten I wanted to remove it. Thanks!

clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗(On Diff #241752)

I like that error message because that is the truth, that is the issue. If it would crash so the operating system would detect a fatal error we should not develop Static Analyzer and vulnerabilities would not exist. That would be cool, btw.

balazske added inline comments.Mar 25 2020, 1:58 AM
clang/docs/analyzer/checkers.rst
1935 ↗(On Diff #241752)

So I can move this checker: D71510 into alpha.security.cert.err.33c?

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
803

First sentence is OK, the later part relatively better. Probably the explanations about why it is good to turn it on or off can be left out, these are misunderstandable. (If the manual null-termination is done, in one case it is a "common idiom but can be misused so make warning" on other case "these warnings generates only extra noise".)
It can be something like this (if correct):

Whether the checker wars on the function calls immediately or warns on bugprone hand-written null-termination of function call made strings. Default value is on because null-terminating the string by hand after the insecure function call could be misused. But it is a common idiom to manually null-terminate the strings immediately after the function calls so it could be turned off to reduce noise of the checker.

I do not like the word "bugprone", is it correct english? ("Problematic" or "insecure" could be used instead.)

clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗(On Diff #241752)

There must be some misunderstanding here, I still do not know what this test does. Can probably somebody other explain it? After the first strcpy the src content is copied into dest. If src is not a null-terminated string this is itself a bug and strcpy is undefined behavior. If src has strlen(src) > 127 it is a bug again because the copy can write outside of dest past its end, into the stack frame of the function. After this happens, writing a null terminator does not correct the other overwritten data that still can cause crash or other thing. (This strcpy call is the place for a warning. Probably not a fatal error because it is not sure, if size of src is not known.) Otherwise result of strcpy is a null-terminated string in dest. The next *dest = '\0' causes that dest points to an empty string, why is this needed here (it was already null terminated)? The following strcpy overwrites it anyway.

Charusso marked 6 inline comments as done.Mar 25 2020, 12:35 PM

"To prevent such errors, either limit copies through truncation or, preferably, ensure that the destination is of sufficient size to hold the character data" - from the rule's page.
Most of the projects are fine truncating by hand because the write happens in somewhat well-bounded strings: IP-addresses, names, numbers... I wanted to make this as practical as possible. Until you are having a null-terminated string without being read, you are most likely fine. Feel free to try this out, probably you would already understand the WarnOnCall option very well.

clang/docs/analyzer/checkers.rst
1935 ↗(On Diff #241752)

Well, not exactly. No one should care about alpha checkers except advanced Static Analyzer developers like you do right now. Since alpha is a dead-zone you can put your work anywhere. The core issue was Ted's placing of CERT rules.

I have written the first CERT-checker after Ted's, which introduced such packaging, then it became dead, so I have lost interest to make the package-aliasing a thing. If you wish to move to cert, feel free, but please note that, your first idea of bad-packaging is right. We really need the functionality of package-aliasing but I have ran out of time and the project died as well.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
803

bugprone comes from the Clang-Tidy world and we use it everywhere, but let us change it to insecure then. I will copy-paste it. Thanks!

clang/test/Analysis/cert/str31-c-notes-warn-on-call-off.cpp
68 ↗(On Diff #241752)

Well, I did not make any comments about what is going on, because it is far away of being finished, it is an alpha checker. Until we have not designed it, it is useless to write documentation, but feel free to read the discussion of the checker's evolution, so that I do not need to repeat myself. Once we have designed it, sure, I will document the design.

Let us first clear your misconceptions:

After the first strcpy the src content is copied into dest. If src is not a null-terminated string this is itself a bug and strcpy is undefined behavior.

Cool, but this is another rule namely STR32-C: D71033. Here we do not care about that issue. Each test should be isolated and test the behavior what we expect. We do not care about another patch/rule here.

If src has strlen(src) > 127 it is a bug again because the copy can write outside of dest past its end, into the stack frame of the function.

It is a bug in a security point of view. However, in the wild, no one really cares about it, and null-terminate the string by hand. That is why this is the test file when WarnOnCall=false to reduce the noise of the checker if someone is fine with such concept, but still want to catch non-null-terminated strings. So that this checker servers the needs of "enough space for holding the string" and "null-terminated strings" given by the assumption src is already null-terminated or dest is null-terminated by hand before the string is possibly read. Most of the C projects have a constant global array which could hold 1024 characters and they do not care if overflow happens, but still care if null-termination happens so truncate the string by hand.

After this happens, writing a null terminator does not correct the other overwritten data that still can cause crash or other thing.

Tell to the C open source project (like VIM) writers they are dumb. I still want to make sure they could benefit from this checker in a non-security point of view.

(This strcpy call is the place for a warning. Probably not a fatal error because it is not sure, if size of src is not known.)

Nope, D71033.

Otherwise result of strcpy is a null-terminated string in dest.

Nope, in a non-security point of view there is no issue, because the normal behavior of the program to read an IP-address or something bounded and just a truncation needed for the enter/space characters to the end. This is a normal C-programmer idiom and on projects where no-one cares about security, like VIM.

The next *dest = '\0' causes that dest points to an empty string,

Yes.

why is this needed here (it was already null terminated)? The following strcpy overwrites it anyway.

*dest = '\0' demonstrates the usual C-programmer behavior. They somehow null-terminate the string then everything is alright, no buffer-overflow read could happen. However, on the second time the null-termination may not happen since the dest buffer moves out of scope and possibly read by do_something(). If it is read in a non-null-terminated way, that is a problem in their world.

Probably you get it now why such option introduced. Feel free to try this out. The thing you will see, we cannot point out interesting properties of the bug, such as the destination buffer, so that we cannot give great bug reports, so that the checker remain alpha and useless.

Charusso updated this revision to Diff 254061.Mar 31 2020, 5:50 PM
Charusso marked 6 inline comments as done.
Charusso edited the summary of this revision. (Show Details)
  • Get rid of the secondary behavior for now.
  • Fix review comments.

Given that the secondary behavior confuse people I have removed it for now. May if someone introduce a NullTerminationChecker then we introduce such option to warn on insecure calls instant. Thanks @balazske for influencing that change. @NoQ this project had a deadline like half year ago, could you smash that green button please?

balazske added inline comments.Apr 1 2020, 1:29 AM
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
56

The functions gets, strcpy return a null-terminated string according to standard, probably sprintf too, and the fscanf %s output is null-terminated too even if no length is specified (maybe it writes outside of the specified buffer but null terminated).

clang/test/Analysis/cert/str31-c.cpp
118

Do we get a warning if the +1 above is missing?

Charusso updated this revision to Diff 254420.Apr 1 2020, 10:05 PM
Charusso marked 4 inline comments as done.
  • Simplify tests.
  • Remove dead code, they are far away to being used.
  • Add an extra test case.

Thanks for the review, hopefully if I ping @NoQ in every round, it will be green-marked soon.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
56

From where do you observe such information? I have not seen anything strcpy()-specific in the standard. My observation clearly says that, if there is not enough space for the data you *could* even write to overlapping memory, or as people says, you could meet nasal demons. It is called *undefined* behavior.

clang/test/Analysis/cert/str31-c.cpp
118

Test added.

Charusso updated this revision to Diff 254423.Apr 1 2020, 10:09 PM
  • Remove the last dead comment.
Charusso updated this revision to Diff 265959.May 24 2020, 9:19 PM
Charusso retitled this revision from [analyzer] CERT: STR31-C to [analyzer] CERT STR rule checkers: STR31-C.
  • Refactor.
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
798

Maybe we could have more descriptive names for these checkers and mention the number of the rule in the HelpText only.

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
67

I would avoid reusing the prefix check for functions that are not inherited from Checker. Alternatives could be handle, process etc. for modeling and verify, validate etc. for checking for error.

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
20

Are there so many things common among checkers for SEI CERT string rules that the best option is to implemenet them in a single huge class/file? Would not it be more appropriate to create a library instead for common functions and then implement each rule in a separate checker class? Huge classes are difficult to understand.

baloghadamsoftware requested changes to this revision.Jun 29 2020, 7:36 AM
baloghadamsoftware added inline comments.
clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
56

From cppreference.com:

strcpy() -- Copies the null-terminated byte string pointed to by src, including the null terminator, to the character array whose first element is pointed to by dest.

gets() -- //https://en.cppreference.com/w/c/io/gets//

scanf, fscanf, sscan format %s -- If width specifier is used, matches up to width or until the first whitespace character, whichever appears first. Always stores a null character in addition to the characters matched (so the argument array must have room for at least width+1 characters)

Thus @balazske is completely right. All these functions add the null-terminator. Of course, if the string does not fit, it overwrites the memory after the buffer which is undefined behavior. But the null-terminator is definitely added.

101

Do we really need a separate utility function for that?

137

Create SmallString, use a stream to print into it, create another one in printPretty, convert it to std::string and then copy it again to the original SmallString? Why?

166

sprintf not non-compliant generally. You should calculate the possible maximal length of the string and compare it to the size of the target region.

181

Checking for %s is not enough. %2s may also be a problem if the buffer is 1 character long.

clang/test/Analysis/cert/str31-c.cpp
28

Why do we need a different buffer size here? And why enum?

This revision now requires changes to proceed.Jun 29 2020, 7:36 AM
whisperity added a subscriber: whisperity.

Please do not bypass the previous comments that hadn't reached a conclusion -- littering inlines about miscellaneous stuff at this stage does more harm then good, and derails the discussion.

Its not the first time I took a look, and I made some progress today again, but there is still much to learn, so I'll wait a bit before throwing my two cents in.

Charusso updated this revision to Diff 278368.Jul 15 2020, 9:14 PM
Charusso marked 18 inline comments as done.
Charusso edited the summary of this revision. (Show Details)
  • Resolve most of the review comments.
  • We really need to specify the design of future checkers.

Thanks for the reviews!

Please do not bypass the previous comments that hadn't reached a conclusion -- littering inlines about miscellaneous stuff at this stage does more harm then good, and derails the discussion.

Ugh, it is a long story. I really wanted to create the best and trendiest checker of all time with Artem so all the comments are made for him at first. We have had the exact opposite sense what we are trying to achieve here, that is why we left the project dead.

Hopefully I get your suggestion right and you want to make sure I have addressed all the comments. Let me begin with a suggestion as well:

If you are interested in the solution, please do not care about the problem, but the solution.

  • non-literal advice from It's Not About the Shark: How to Solve Unsolvable Problems

Let me rewind / summarize what problems happened in chunks if you would Ctrl+F5 for certain quotes and what we did in chronological order:

You're bringing in a completely brand-new machinery here, could you explain how it works and why do you need it?

I have attached multiple bug reports to the program state (by registering a map) and invalidated all the non-fatal reports in chain on the bug-path to a given fatal-report to support two functionality of the checker.

That two functionality was a complete stupidity it turned out, I have left only the necessary one.

I think it would really help if you draw a state machine for the checker

Because I have removed the complexity of the checker now it is easy to understand and it does not require an appendix.

But it doesn't necessarily mean that static analysis tools can be built by generalizing over the examples from the CERT wiki.

Artem refuse the usefulness of the CERT rules from the static analysis perspective, which I agree with.

"if string length metadata is tainted (in particular, if the string itself is tainted) and is potentially larger than the destination buffer size (eg., unconstrained), warn"

Artem suggests to develop taint analysis, which I do not want to develop, but his concerns are real.

Traditionally checkers either warn immediately when they can detect an error, or assume that the error has not happened.

That was the near the point when I have realized we need to warn on the function calls immediately. The buffer overflow is immediately a security issue, but I was dumb to realize.

I took a look at first 5 and in all of them the code does in fact ensure that the buffer space is sufficient, but in 5 different ways.

Artem believes the arrays are cool because they cannot overflow. For example we obtain an IP address, which has enough space as char[64] our mental model suggests. I believe any overflow is not cool in a security point of view: https://twitter.com/TwitterSupport/status/1283526400146837511

Now I have accepted Artem's suggestion and I do not care about arrays because the Analyzer cannot model the constraints of the array size yet / taintness. If someone working with a plain char array that could be easy to detect and measure. For the security problems we want to catch allocations.

The known size does not mean that the string is going to be null-terminated.
The problem is not the size, but the missing '\0' (which you can have multiple of at any point).

I still wanted to care about null-termination, however the entire checker should be about overflow.

  • "The checker warns every time it finds an execution path on which unintended behavior occurs"
  • "The checker enforces a certain coding guideline on the user": (say, "don't pass the string by pointer anywhere before you null-terminate it")
  • If you rely on the existing taint analysis infrastructure and make a good check, that'll be wonderful and would further encourage us to move taint analysis out of alpha.

Artem tried to create a direction for the checker, it was pretty complex. I have refused again, because all the people null terminate by hand.

This text may be hard to understand.

Balazs get confused by the second behavior, like Artem did for a while. I have tried to rephrase the documentation. With Balazs' help we made the documentation cool (Thanks again!).

Get rid of the secondary behavior for now.

That was the time when I have learnt complexity is a huge issue and people will get confused. I have removed the entire secondary behavior of null termination checking and moved some logic to STR32-C.


The conclusion: Now this checker is very strict and hopefully enough simple. I hope we could design the future of checker-writing.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
798

The naming is designed with Aaron and Artem, so that it is consistent with the Clang-Tidy naming. We really need to support aliasing names here.

clang/lib/StaticAnalyzer/Checkers/CMakeLists.txt
20

things common among checkers for SEI CERT string rules

Well, yes.

best option is to implemenet them in a single huge file

Sadly that is our standard now and there is no API to make it well-engineered by default.

This is my first real checker and I could not really be the pioneer of creating the new standard of checker writing as I am not that experienced on my own. I believe if all the Ericsson people would gather and brainstorm for a long period of time and create the initial better way of doing checkers as a design document we could improve such document as the whole community. If some extent of consensus met by most of the developers, we could start to create a better environment, which already would create the design how should we split up the files. I do not see the immediate gains of splitting up stuff as long as we do not create such design to make sure we do not need to split up or move code anymore.

I wanted to follow the latest trends with that checker and eventually come up with the design how to write checkers. After that I have realized it is way too much work for an individual and it is a very long run where such change is necessary: in function summaries (http://lists.llvm.org/pipermail/cfe-dev/2015-October/045730.html)

clang/lib/StaticAnalyzer/Checkers/cert/StrChecker.cpp
56

Well, let assume that we rely on the cppreference page from now on as our best approximation of the required behavior. Thanks for the notes.


strcpy():

Copies the null-terminated byte string pointed to by src, including the null terminator
All these functions add the null-terminator.

*Copies* does not mean it adds the null-terminator to end of the dest explicitly. It is possible the src is already not null-terminated so there is no way to copy the '\0', and so that the STR32-C rule made:
https://wiki.sei.cmu.edu/confluence/display/c/STR32-C.+Do+not+pass+a+non-null-terminated+character+sequence+to+a+library+function+that+expects+a+string


I had not checked the other functions in the standard. gets() and fscanf() is going to be null-terminated according to cppreference and the IEEE standard, cool. I will release my fix on that topic in D71033.

67

The core checkers which I have seen use the prefix check which I became in love with. I have not split the modeling/checking logic up enough well so there is no need to use prefix verify/validate for now, but I prefer to use the prefix eval for the same reason.

101

Yes, it is handy.

137

I like the style of the stream chained together and the easy usage of printPretty(). Basically that is why.

166

I really wanted to check for a single %s, but somehow I have removed it, facepalm. Thanks!

181

Yes, there are other problems to check. I wanted to catch the given examples at first to make sure the patch is easy to understand.

Thanks for the idea, we definitely need to use more of the dynamic size support. I have added this extra feature for both fscanf() and sprintf().

clang/test/Analysis/cert/str31-c.cpp
28

I have copied the examples of the rule STR31-C as-is. I like the variety of buffer sizes.