This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Detect usages of unsafe I/O functions
ClosedPublic

Authored by koldaniel on Jul 6 2017, 9:52 AM.

Details

Summary

There are certain unsafe or deprecated (since C11) buffer handling
functions which should be avoided in safety critical code. They
could cause buffer overflows. Two new checks had been implemented
which warn for every occurrence of such functions
(unsafe or deprecated printf, scanf family, and other buffer
handling functions, which now have a secure variant).

Diff Detail

Repository
rL LLVM

Event Timeline

koldaniel created this revision.Jul 6 2017, 9:52 AM

This does not do anything more than traversing the AST, shouldn't this be a clang-tidy check?
Also, i suspect CERT-MSC24-C might be relevant

lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
607 ↗(On Diff #105452)

depr*e*cated

xazax.hun retitled this revision from Detect usages of unsafe I/O functions to [analyzer] Detect usages of unsafe I/O functions.Jul 7 2017, 2:43 AM
xazax.hun edited the summary of this revision. (Show Details)
xazax.hun added reviewers: dcoughlin, NoQ.
xazax.hun set the repository for this revision to rL LLVM.
xazax.hun added a subscriber: xazax.hun.
NoQ edited edge metadata.EditedJul 17 2017, 8:35 AM

It'd look good in clang-tidy (especially if extended to provide fixits), but if Daniel is interested in having this feature in the analyzer (and picked by clang-tidy from there), i wouldn't mind.

I wonder how noisy this check is - did you test it on large codebases? Because these functions are popular, and in many cases it'd be fine to use insecure functions, i wonder if it's worth it to have this check on by default. Like, if it's relatively quiet - it's fine, but if it'd constitute 90% of the analyzer's warnings on popular projects, that'd probably not be fine.

lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
597–598 ↗(On Diff #105452)

Note that you cannot easily figure out if the code is intended to get compiled only under C11 and above - maybe it's accidentally compiled under C11 for this user, but is otherwise intended to keep working under older standards.

632 ↗(On Diff #105452)

Because it also checks deprecated buffer handling, i'd rename this function to checkDeprecatedOrUnsafeBufferHandling.

670–675 ↗(On Diff #105452)

You'd probably also want to quit early if the format string is not a literal.

In D35068#811437, @NoQ wrote:

It'd look good in clang-tidy (especially if extended to provide fixits), but if Daniel is interested in having this feature in the analyzer (and picked by clang-tidy from there), i wouldn't mind.

I wonder how noisy this check is - did you test it on large codebases? Because these functions are popular, and in many cases it'd be fine to use insecure functions, i wonder if it's worth it to have this check on by default. Like, if it's relatively quiet - it's fine, but if it'd constitute 90% of the analyzer's warnings on popular projects, that'd probably not be fine.

This patch basically extends an already existing static analyzer check. Even if tidy might be a better fit, I wonder what is the right thing to do in this case. We either end up overlapping functionality with the analyzer and tidy or have to come up with a policy what to do in this such cases.

xazax.hun added inline comments.Aug 8 2017, 1:32 AM
lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
622 ↗(On Diff #105452)

I would put a new line above and remove one bellow.

632 ↗(On Diff #105452)

Is the TODO still relevant in this line?

koldaniel marked 4 inline comments as done.Aug 29 2017, 4:50 AM
koldaniel added inline comments.
lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
597–598 ↗(On Diff #105452)

It is a possible scenario, how should I check if the checks should warn (safe functions are available) if not by using this method?

670–675 ↗(On Diff #105452)

If the format string is not a literal (i.e. a variable), currently we cannot determine if there were any restrictions regarding the size or not, so we want this check to warn.

koldaniel updated this revision to Diff 113065.Aug 29 2017, 5:12 AM

Updated checker name, minor modifications

koldaniel updated this revision to Diff 113094.Aug 29 2017, 8:00 AM

Renaming the unsafe checker, updating tests.

koldaniel updated this revision to Diff 117782.Oct 5 2017, 1:22 AM
xazax.hun added inline comments.Nov 2 2017, 8:19 AM
include/clang/StaticAnalyzer/Checkers/Checkers.td
382 ↗(On Diff #117782)

I do not like the naming of these two checks, It feels like one of them warns for a subset of the other, however, it is not the case.
What about removing the "deprecated" part from the first check?

lib/Driver/ToolChains/Clang.cpp
2256 ↗(On Diff #117782)

Do we want to turn on both of these by default? Warning on every use of memset will be very noisy for example.

lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
147 ↗(On Diff #117782)

This is the only print-related function in this group. Is this intended?

597–598 ↗(On Diff #105452)

This is the same approach some check takes in clang tidy. I think it is still better to not warn for non C11 users than warn for everybody. If a project is not interested in this check but they are interested in having C11 builds, they can turn this check off. (Or it can be off by default in the first place).

koldaniel added inline comments.Nov 13 2017, 9:30 AM
include/clang/StaticAnalyzer/Checkers/Checkers.td
382 ↗(On Diff #117782)

Both checker warns if a buffer handling function is deprecated (DeprecatedOrUnsafeBufferHandling calls DeprecatedBufferHandling), but the DeprecatedOrUnsafeBufferHandling checker also warns if a function is not only deprecated but unsafe (i.e. writes a buffer without size restrictions) too.

lib/Driver/ToolChains/Clang.cpp
2256 ↗(On Diff #117782)

It's true that these checkers could be very noisy if they are enabled by default, I'm going to fix this.

lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
147 ↗(On Diff #117782)

sprintf and vsprintf are buffer-writing functions without size restrictions, this is the reason why only these two are checked by DeprecatedOrUnsafeBufferHandling of the printf family. There are buffer-writing members of this family which are not unsafe (because of size restrictions) but have a secure version introduced in the C11 standard, and DeprecatedBufferHandling warns for them: swprintf, snprintf, vswprintf, vsnprintf. Other printf-related functions are either printing to a file or to the standard output.

xazax.hun added inline comments.Nov 15 2017, 5:39 AM
include/clang/StaticAnalyzer/Checkers/Checkers.td
382 ↗(On Diff #117782)

I see. Maybe it would be better to make them disjoint? Also, I think it is not a good user experience to get two warnings for the same function call.

koldaniel added inline comments.Nov 16 2017, 6:52 AM
include/clang/StaticAnalyzer/Checkers/Checkers.td
382 ↗(On Diff #117782)

Do you mean to separate them, so we would have one checker which warns for the unsafe and deprecated buffer handling functions, and one which warns for the deprecated functions which have some boundary restrictions?

koldaniel updated this revision to Diff 136261.Feb 28 2018, 2:57 AM
lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
100 ↗(On Diff #136261)

80 chars

165 ↗(On Diff #136261)

That's a lot of duplicated WalkAST::checkDeprecatedOrUnsafeBufferHandling. Could that be simplified?

618 ↗(On Diff #136261)

That's a lot of duplication of 1/0/-1.

And also 1/0/-1 are cryptic symbols, why not use an enum with a descriptive name?
Maybe use
.Cases("sprintf", "vsprintf", "vfscanf", WARN_UNSAFE) ?

koldaniel added inline comments.Mar 5 2018, 5:25 AM
lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
618 ↗(On Diff #136261)

The duplications will be solved by using .Cases, but is using enum necessary? 1 and 0 refers to the index of the argument which could be a format string. -1 means there is no need to look for a string like this.

lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
618 ↗(On Diff #136261)

Right, then I guess 1 and 0 are self-descriptive, but I would still add a separate constant for -1.

koldaniel updated this revision to Diff 137162.Mar 6 2018, 5:10 AM

Do you have any other comments, or could this change be accepted?

@koldaniel Have you evaluated this checker? On which codebases? Were the warnings real security issues, or were they mostly spurious? The code seems fine, but I'm not sure whether it should be in security or in alpha.

@koldaniel Have you evaluated this checker? On which codebases? Were the warnings real security issues, or were they mostly spurious? The code seems fine, but I'm not sure whether it should be in security or in alpha.

I've evaluated this checker on LLVM+Clang, there were only a few (about 15) warnings, because of the C11 flag check at the beginning of the checker body. However, if this check was removed, number of the warnings would be increased significantly. I wouldn't say the findings were real security issues, most of the warnings were about usages of deprecated functions, which has not been considered unsecure (but which may cause problems if the code is modified in an improper way in the future).

Do you have any other comments? Could this checker be delivered into security?

george.karpenkov resigned from this revision.May 30 2018, 5:03 PM
george.karpenkov added a subscriber: george.karpenkov.

Hi, could you please take a look at this issue?

xazax.hun accepted this revision.Jan 17 2019, 3:09 AM

LGTM!

Any objections to commit this?
I think this is quiet coding guideline specific check which is useful for a set of security critical projects. As this is an opt in kind of check, I think it does no harm to have it upstream.

This revision is now accepted and ready to land.Jan 17 2019, 3:09 AM
Szelethus accepted this revision.Jan 17 2019, 10:05 AM

Overall I think this looks great, thanks! I left some inlines that would be nice to fix before commiting, but all of them are minor nits.

Would it be possible for you to commit the clang-formatting and the actual logic separately?

include/clang/StaticAnalyzer/Checkers/Checkers.td
393 ↗(On Diff #137162)

Lately we started paying attention to the 80 column limit in Checkers.td. Could you please break this line?

lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
583 ↗(On Diff #137162)

This is out of place.

591 ↗(On Diff #137162)

Could you please add an extra newline?

598 ↗(On Diff #137162)

I think this newline should stay.

603–610 ↗(On Diff #137162)

I wonder whether using CallDescription would be (way) more efficient here. Comparing IndentifierInfos would also be better I guess.

I'm a little unsure about performance implications here, it might not be worth the chore to refactor this.

627–630 ↗(On Diff #137162)

Could you please start these variable names with a capital letter?

test/Analysis/security-syntax-checks.m
253 ↗(On Diff #137162)

When using {{}}, you actually supply a regex as an argument, and the output of the analyzer is matched against it. My point is, could you instead just write

// expected-warning{{Call to function 'sprintf' is insecure}}

to improve readability?

Szelethus added inline comments.Jan 17 2019, 10:07 AM
test/Analysis/security-syntax-checks.m
253 ↗(On Diff #137162)

Or whatever the shortest string is needed to know whether the expected output it there.

Szelethus requested changes to this revision.EditedJan 17 2019, 10:17 AM
In D35068#811436, @NoQ wrote:

I wonder how noisy this check is - did you test it on large codebases? Because these functions are popular, and in many cases it'd be fine to use insecure functions, i wonder if it's worth it to have this check on by default. Like, if it's relatively quiet - it's fine, but if it'd constitute 90% of the analyzer's warnings on popular projects, that'd probably not be fine.

@koldaniel Have you evaluated this checker? On which codebases? Were the warnings real security issues, or were they mostly spurious? The code seems fine, but I'm not sure whether it should be in security or in alpha.

Sorry, didn't read the discussion, there are some fair points in the quoted comments.

I've evaluated this checker on LLVM+Clang, there were only a few (about 15) warnings, because of the C11 flag check at the beginning of the checker body. However, if this check was removed, number of the warnings would be increased significantly. I wouldn't say the findings were real security issues, most of the warnings were about usages of deprecated functions, which has not been considered unsecure (but which may cause problems if the code is modified in an improper way in the future).

My problem is that LLVM+Clang isn't really a C (nor a C11) project, and I think judging this checker on it is a little misleading. Could you please test it on some C11 projects? I think tmux uses C11.

Edit: it doesn't, but CMake is mostly a C project and it does!

I think this is quiet coding guideline specific check which is useful for a set of security critical projects. As this is an opt in kind of check, I think it does no harm to have it upstream.

I do generally agree with this statement, but I'd be more comfortable either landing it in alpha or seeing some other results.

This revision now requires changes to proceed.Jan 17 2019, 10:17 AM

Edit: it doesn't, but CMake is mostly a C project and it does!

CMake really isn't a C project if you look at what language it actually uses - the C files come from tests and system utilities such as ZIP being reimplemented.
And as far as I've seen with my recent endeavours, Clang has problems with parsing and building CMake (however all these errors relate to some very niche standard library memory size related stuff not being there where it wants them).

I've evaluated this checker on LLVM+Clang, there were only a few (about 15) warnings, because of the C11 flag check at the beginning of the checker body. However, if this check was removed, number of the warnings would be increased significantly. I wouldn't say the findings were real security issues, most of the warnings were about usages of deprecated functions, which has not been considered unsecure (but which may cause problems if the code is modified in an improper way in the future).

My problem is that LLVM+Clang isn't really a C (nor a C11) project, and I think judging this checker on it is a little misleading. Could you please test it on some C11 projects? I think tmux uses C11.

Edit: it doesn't, but CMake is mostly a C project and it does!

What do we want to validate here? The lack of crashes? Or evaluate false positive ratio?

I have some doubts about evaluating this checker on open source projects. If a project does not care about the safe versions of these functions all of the results will be false positive (or a project might actually care but will not be able to comply due to portability constraints). If a project does care about using the safe variants, they are most likely already using another tool to verify this.
So I think the main value here is to subsume other tools.

To add an analogy, Clang Tidy will not require C++ Core Guidelines related checks to be evaluated on projects that are not following the guidelines as the results are meaningless for those projects.

Szelethus accepted this revision.Jan 21 2019, 1:44 AM

Yup, I'm sold on that.

This revision is now accepted and ready to land.Jan 21 2019, 1:44 AM
NoQ accepted this revision.Jan 28 2019, 4:08 PM

Ok! I hope that the C11 check would do the trick, let's see how it goes :)

What do we want to validate here? The lack of crashes? Or evaluate false positive ratio?

This checker is opt-in, but it still has a notion of a useless positive, and it is something that can often be improved upon. Crashes are also a great thing to look for :)

To add an analogy, Clang Tidy will not require C++ Core Guidelines related checks to be evaluated on projects that are not following the guidelines as the results are meaningless for those projects.

On the contrary, figuring out if the project follows C++ Core Guidelines is pretty hard. Well, it's often pretty easy to demonstrate that it doesn't (: But, i mean, the intention.

lib/StaticAnalyzer/Checkers/CheckSecuritySyntaxOnly.cpp
597–598 ↗(On Diff #105452)

Hmm, i guess in many cases such clients may suppress the warning through preprocessing. This will keep the checker warning in C11 builds but would also avoid miscompiles in builds for the older standards.

Could you rebase please? Many things have changed since you last update. After that I'll happily commit on your behalf if you don't have a commit access just yet.

Hmmm, DescFile was removed months ago from Checkers.td, are you sure you uploaded the correct diff?

Hmmm, DescFile was removed months ago from Checkers.td, are you sure you uploaded the correct diff?

No, you are absolutely right, an earlier diff had been uploaded, I will correct it.

koldaniel updated this revision to Diff 186131.Feb 9 2019, 12:21 PM

Rebased. Documentation added.

Szelethus accepted this revision.Feb 9 2019, 2:03 PM

This looks great, thanks! I'll commit sometime tomorrow.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptFeb 11 2019, 5:48 AM
NoQ added a comment.Mar 21 2019, 1:06 PM

There seems to be a crash in this code. @koldaniel, would you like to take a look? https://bugs.llvm.org/show_bug.cgi?id=41185

In D35068#1438498, @NoQ wrote:

There seems to be a crash in this code. @koldaniel, would you like to take a look? https://bugs.llvm.org/show_bug.cgi?id=41185

Hi,

True, it is a faulty scenario, my question is what should be the way forward? I think in case of built-in functions there should be no warning, since they differ from the deprecated ones which come from the old standard. The only purpose of the assert was to help development and maintenance (if a new function had been added, it should be decided if it is deprecated or unsafe). Returning instead of asserting would solve the problem.

NoQ added a comment.Mar 24 2019, 8:45 PM
In D35068#1438498, @NoQ wrote:

There seems to be a crash in this code. @koldaniel, would you like to take a look? https://bugs.llvm.org/show_bug.cgi?id=41185

Hi,

True, it is a faulty scenario, my question is what should be the way forward? I think in case of built-in functions there should be no warning, since they differ from the deprecated ones which come from the old standard. The only purpose of the assert was to help development and maintenance (if a new function had been added, it should be decided if it is deprecated or unsafe). Returning instead of asserting would solve the problem.

On many platforms such standard C functions are implemented as macros that expand to the respective builtins. I believe there should be no difference in behavior between the normal function and the builtin function. Cf. test/Analysis/bstring.c.

koldaniel updated this revision to Diff 192120.Mar 25 2019, 8:55 AM

Bug fixing: faulty handling of built-in functions.

Bug fixing: faulty handling of built-in functions.

Please open a new differential, that is a completely new patch.

Bug fixing: faulty handling of built-in functions.

Please open a new differential, that is a completely new patch.

Hi,

Done: https://reviews.llvm.org/D59812