Since %n can be used to write to memory and traditionally has been used in format string
vulnerabilities, we wanted to present a warning when it is used. This warning is to dissuade
developers from using the potentially unsafe %n format specifier. Currently this warning is
surfaced under the flag -Wformat-n-specifier which is enabled under the group -Wformat.
The way this information is surfaced is pending discussion from this
RFC: https://lists.llvm.org/pipermail/cfe-dev/2021-September/068986.html.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
The patch description says what the change is, but not why it is the way it is.
In particular, it might be helpful to be more verbose than just stating that something is unsafe.
Since no discussion came out of the RFC I'll leave the warning under the -Wformat-n-specifier flag under -Wformat
unless there's other ideas brought up. Would appreciate any reviews at this points! :)
The trouble with this diagnostic is that it throws the baby out with the bathwater. It is possible to securely use %n, so we can't have this warning be on by default because it will have too high of a false positive rate. However, we typically don't introduce new warning flags that are off by default because experience has shown that users typically do not enable those.
Can we reduce the diagnostic's scope to only the problematic uses of %n instead of all uses? If all uses is the desired diagnostic, have you considered adding it to the bugprone module in clang-tidy instead?
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
9230 | FWIW, I don't understand why this is "unsafe" either. The problem with %n is not that it might be used intentionally; the problem is that it opens up an attack vector for unintentional (malicious) use. The programmer writes printf(buf, args...) where buf is under the attacker's control (for example a debug-log format string supplied in a config file), and then the attacker configures something like "%n" instead of "%s%d" (so the debug-logging routine ends up poking data instead of peeking it). This vulnerable printf(buf, ...) is exactly what -Wformat-security warns about. |
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
9230 | FWIW, that's effectively how I view this as well -- it's kinda like -Wvla -- a diagnostic to say "congrats, you're using the feature!". However, unlike -Wvla, no one accidentally uses %n when they were expecting something else. %n isn't deprecated in either C or C++. |
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
9230 | That's a really good point, I think you're right that this wouldn't be effective in catching string format vulnerabilities, because as you said, the attacker would need to control some portion of the format string which would already be caught by -Wformat-security. Although in fuchsia I think we still have a need for warning against even explicit usages of %n since it could allow an external user to pass a pointers into printf("%n", ptr) and write to addresses that we don't want to be written to. Thank you for the suggestion of clang-tidy. I think moving this warning to the bugprone module sounds like a good way to surface a warning while not impacting intended usages of %n. |
clang/include/clang/Basic/DiagnosticSemaKinds.td | ||
---|---|---|
9230 | Once it targets clang-tidy instead of clang, I stop caring. ;) But for the record, I still think you haven't understood how %n works. If you have let a malicious external user control the pointer value of ptr and point it at memory they don't own, you have already lost; at that point the arcane incantation printf("%n", ptr) is no more dangerous than a simple assignment *ptr = 42 (because both are ways of poking into memory that you've already postulated was chosen by the attacker). If that's really your concern (which, again, it shouldn't be), then maybe you should instead warn on scanf("%p") — off the top of my head I can't think of any other way for an external attacker to control the value of an arbitrary pointer ptr. |
Move check for %n from clang to clang-tidy as a checker under bugprone-percent-n-format-specifier.
clang-tools-extra/clang-tidy/bugprone/PercentNFormatSpecifierCheck.cpp | ||
---|---|---|
26–27 ↗ | (On Diff #381103) | |
43–57 ↗ | (On Diff #381103) | Rather than separate all these into individual matchers, I think it's better to use hasAnyName(). e.g., Finder->addMatcher( callExpr(callee(functionDecl( hasAnyName("::printf", "::vprintf", "::scanf", "::vscanf"))), hasArgument(0, stringLiteral().bind("StringLiteral"))), this); Also, it looks like this misses the wchar_t variants. One additional design question is whether we should consider user-specified functions which use the format attribute in general. Using that attribute implies the function handles format specifier strings, so it seems like those functions would also want to flag %n for this particular check. |
89 ↗ | (On Diff #381103) | FWIW, this diagnostic sounds more scary than I think it should. This implies to me that tidy has found an unsafe usage when in fact, tidy is only identifying that you have used the feature at all. Personally, I think it's more useful to limit the check to problematic situations. Use of %n by itself is not unsafe (in fact, I cannot think of a situation where use of %n in a *string literal* format specifier is ever a problem by itself. Generally, the safety concerns come from having a *non string literal* format specifier where an attacker can insert their own %n. If you want this check to be "did you use %n at all", then I think the diagnostic should read more along the lines of '%n' used as a format specifier instead. However, I question whether "bugprone" is the right place for it at that point, because it's not really pointing out buggy code. |
clang-tools-extra/docs/clang-tidy/checks/bugprone-percent-n-format-specifier.rst | ||
7–8 ↗ | (On Diff #381103) | Similar concerns about overselling the safety aspect of using %n. |
clang-tools-extra/clang-tidy/bugprone/PercentNFormatSpecifierCheck.cpp | ||
---|---|---|
43–57 ↗ | (On Diff #381103) | Thanks! Will change to use hasAnyName() and will add the wchar_t variants. Regarding the matching user-specified functions, I was interested in doing that but I'm struggling to find a way to match it. Do you have any suggestions? |
89 ↗ | (On Diff #381103) | I think that's fair and changing the wording to just calling out the usage of the feature makes sense. The original motivation behind this change was because Fuchsia plans to disable usage of %n altogether. So we could possibly move this check to be under "fuchsia" rather than "bugprone". That being said, I don't have full context behind the motivation to disable usage of %n but I believe that even explicit usage of the %n can be considered "bugprone" since it's difficult to guarantee that the pointer you are writing to comes from a reliable source. |
clang-tools-extra/clang-tidy/bugprone/PercentNFormatSpecifierCheck.cpp | ||
---|---|---|
43–57 ↗ | (On Diff #381103) |
I think something like functionDecl(hasAttr(clang::attr::Format)) should get you close -- that'll match the function declaration with the attribute, and from there you can look at the attribute string in the check() function. |
89 ↗ | (On Diff #381103) |
That would make me feel more comfortable.
I disagree that this is a bugprone pattern; that's like suggesting that use of %s is bugprone because you can't guarantee that the pointer being read from comes from a reliable source. The programmer specifies the pointer in both cases. There is absolutely nothing bugprone about: int n = 0; printf("%s, %s%n", "hello", "world", &n); |
clang-tools-extra/clang-tidy/bugprone/PercentNFormatSpecifierCheck.cpp | ||
---|---|---|
89 ↗ | (On Diff #381103) | Ok that seems reasonable, I'll move this check to fuchsia then. |
FWIW, I don't understand why this is "unsafe" either. The problem with %n is not that it might be used intentionally; the problem is that it opens up an attack vector for unintentional (malicious) use. The programmer writes printf(buf, args...) where buf is under the attacker's control (for example a debug-log format string supplied in a config file), and then the attacker configures something like "%n" instead of "%s%d" (so the debug-logging routine ends up poking data instead of peeking it). This vulnerable printf(buf, ...) is exactly what -Wformat-security warns about.
I am not aware of any vulnerability from intentional use of %n. At best, one could argue that there's a moral hazard: we might like to remove %n-support from our libc's printf, but we can't do that as long as there's any code out there in the wild that relies on intentional use of %n. Therefore, this is essentially a "deprecation warning" — but for a feature that AFAIK has never been deprecated, neither by C nor C++! (Am I wrong? Has someone officially deprecated %n?)