This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add check for null pointer passed to the %p of printf family
Needs RevisionPublic

Authored by CuriousGeorgiy on Jul 10 2023, 5:57 AM.

Details

Summary

The result of passing a null pointer to the pointer conversion specifier of
printf family of functions is implementation defined — for instance, on
Windows zeros are printed, whilst on Linux '(nil)' is printed.

The problem described above is a minor portability issue, so we don't just
add a check for to the existing Unix API checker, since it would be
desirable to be able to disable it.

Instead, to collect such checks we add a new APIPortabilityMinor checker to
the alpha.unix package with a check for the problem described above.

The check assumes that a null pointer is only passed to a pointer
conversion specifier, otherwise it would be undefined behaviour. By making
this assumption we do not have to check the format string and match data
arguments to conversion specifiers.

Closes #43453

Diff Detail

Event Timeline

CuriousGeorgiy created this revision.Jul 10 2023, 5:57 AM
Herald added a project: Restricted Project. · View Herald Transcript
CuriousGeorgiy requested review of this revision.Jul 10 2023, 5:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 10 2023, 5:57 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
CuriousGeorgiy added a comment.EditedJul 10 2023, 5:59 AM

Hi! This is my first time contributing to the LLVM project and particularly the clang static analyzer. This patch is based off https://reviews.llvm.org/D139604?id=481154. I have several questions regarding the reviewers comments (most of which I tried to address) and the patch itself:

  1. One should be able to disable your rule in case something goes wild. So, your checker should be registered at the end of the file by applying the REGISTER_CHECKER() macro.

This check is part of the UnixAPIPortability checker, which is already registered. Do you suggest creating a separate checker for this check?

  1. One should update the documentation and also mention the new check in the release docs to give visibility.

The current documentation for the UnixAPIPortability checker is quite generic (docs/analyazer/checkers.rst) and doesn’t mention specific checks. Should I mention this new check there? Also, I wasn't able to find the write place to add a changelog about the new check to.

  1. Add tests demonstrating all possible aspects of your change.

I added a number of tests for various aspects of the new check, but I’m not sure they are sufficient and I’m not sure what needs to be covered. For instance, should I add test that the warning is not issued for functions not located in the global namespace? Should I add a test case for nullptr?

A couple of questions regarding the patch itself:

  1. Should I cover non-standard (i.e., non ISO C standard) functions from the printf_s family? Should I cover non-standard functions like dprintf?
  2. What is the right way to emit a bug report? I have studied the code base and found several patterns: in some cases, an error node is simply generated, in others a new transition is also added based on the analyzed state. In some cases an expression value is also tracked.
  3. Should I try to report all possible bugs, or only report the first one found?
  4. What is the correct way to update expected plists for test inputs? I tried to simply replace the existing one (unix-fns.c.plist) with the output I get from executing the first command of unix-fns.c test, but I still get the following error:

georgiy.lebedev@georgiy-lebedev llvm-project % build-debug/bin/llvm-lit -sv clang/test/Analysis/unix-fns.c
llvm-lit: /Users/georgiy.lebedev/Work/llvm-project/llvm/utils/lit/lit/llvm/config.py:484: note: using clang: /Users/georgiy.lebedev/Work/llvm-project/build-debug/bin/clang
llvm-lit: /Users/georgiy.lebedev/Work/llvm-project/llvm/utils/lit/lit/util.py:439: note: using SDKROOT: '/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX13.3.sdk'
FAIL: Clang :: Analysis/unix-fns.c (1 of 1)
******************** TEST 'Clang :: Analysis/unix-fns.c' FAILED ********************
Script:
--
: 'RUN: at line 1';   /Users/georgiy.lebedev/Work/llvm-project/build-debug/bin/clang -cc1 -internal-isystem /Users/georgiy.lebedev/Work/llvm-project/build-debug/lib/clang/17/include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -triple x86_64-apple-darwin10 -analyzer-checker=core,unix.API,osx.API,optin.portability /Users/georgiy.lebedev/Work/llvm-project/clang/test/Analysis/unix-fns.c -analyzer-output=plist -analyzer-config faux-bodies=true  -fblocks -verify -o /Users/georgiy.lebedev/Work/llvm-project/build-debug/tools/clang/test/Analysis/Output/unix-fns.c.tmp.plist
: 'RUN: at line 2';   grep -Ev '^[[:space:]]*<string>.* version .*</string>[[:space:]]*$|^[[:space:]]*<string>/.*</string>[[:space:]]*$|^[[:space:]]*<string>.:.*</string>[[:space:]]*$' </Users/georgiy.lebedev/Work/llvm-project/build-debug/tools/clang/test/Analysis/Output/unix-fns.c.tmp.plist | diff -ub /Users/georgiy.lebedev/Work/llvm-project/clang/test/Analysis/Inputs/expected-plists/unix-fns.c.plist -
: 'RUN: at line 3';   mkdir -p /Users/georgiy.lebedev/Work/llvm-project/build-debug/tools/clang/test/Analysis/Output/unix-fns.c.tmp.dir
: 'RUN: at line 4';   /Users/georgiy.lebedev/Work/llvm-project/build-debug/bin/clang -cc1 -internal-isystem /Users/georgiy.lebedev/Work/llvm-project/build-debug/lib/clang/17/include -nostdsysteminc -analyze -analyzer-constraints=range -setup-static-analyzer -analyzer-checker=core,unix.API,osx.API,optin.portability -analyzer-output=html -analyzer-config faux-bodies=true -fblocks -o /Users/georgiy.lebedev/Work/llvm-project/build-debug/tools/clang/test/Analysis/Output/unix-fns.c.tmp.dir /Users/georgiy.lebedev/Work/llvm-project/clang/test/Analysis/unix-fns.c
: 'RUN: at line 5';   rm -fR /Users/georgiy.lebedev/Work/llvm-project/build-debug/tools/clang/test/Analysis/Output/unix-fns.c.tmp.dir
--
Exit Code: 1

Command Output (stdout):
--
--- /Users/georgiy.lebedev/Work/llvm-project/clang/test/Analysis/Inputs/expected-plists/unix-fns.c.plist	2023-07-08 19:58:06
+++ -	2023-07-08 19:58:20
@@ -3,7 +3,6 @@
 <plist version="1.0">
 <dict>
  <key>clang_version</key>
-<string>clang version 17.0.0 (git@github.com:llvm/llvm-project.git 1882a4ee69b3cc2202d8d7d7b6465475f6d19886)</string>
  <key>diagnostics</key>
  <array>
   <dict>
@@ -4065,7 +4064,6 @@
  </array>
  <key>files</key>
  <array>
-  <string>/Users/georgiy.lebedev/Work/llvm-project/clang/test/Analysis/unix-fns.c</string>
  </array>
 </dict>
 </plist>

--

********************
********************
Failed Tests (1):
  Clang :: Analysis/unix-fns.c


Testing Time: 0.79s
  Failed: 1

Expect a review on around the next weekend due to vacations from me.

Instead of checking for hard-coded names, you can check functions with the format(printf, x, y) attribute:

if (auto *Format = FD->getAttr<FormatAttr>())
  CheckPrintfPointerConversionSpecifierNULL(C, CE, Format->getFormatIdx());

You also have to check for pointer types first. This currently warns on printf("%d", 0), since 0 is a null pointer constant (This warning should only be T* pointer types, C2x and C++ nullptr and GNU __null).

Also this only works for null pointer *constants*. printf("%p", (void*) 0) is pretty rare. You ideally want this to warn on the following too:

void f(void* p) {
    if (!p) printf("%p", p);
}
void g(void) {
    void* p = NULL;
    printf("%p", p);
}

Look into how the NonNullParamChecker works

NoQ added a comment.Jul 25 2023, 2:56 PM

Should I cover non-standard (i.e., non ISO C standard) functions from the printf_s family? Should I cover non-standard functions like dprintf?

The static analyzer, unlike the compiler proper, isn't required to treat all code fairly. It's ok to have different behavior depending solely on the function's name. So if you want to support non-standard functions, and you know that they have the same portability issues, totally go for it!

This check is part of the UnixAPIPortability checker, which is already registered. Do you suggest creating a separate checker for this check?

Probably a separate check would be better. The consequences of malloc(0) are likely to be much more dire than consequences of printf("%p", 0), so people may want to enable/disable them separately.

What is the correct way to update expected plists for test inputs?

The run-lines are mostly self-explanatory. Just run it through the grep command in the other run-line. It filters out all the non-transferable stuff. (You clearly don't want to have paths like /Users/georgiy.lebedev/Work/... in test files.)

Or, put your test in a separate file with a simpler run-line that doesn't additionally verify plist output. We already have enough tests for plist output.

Instead of checking for hard-coded names, you can check functions with the format(printf, x, y) attribute:

if (auto *Format = FD->getAttr<FormatAttr>())
  CheckPrintfPointerConversionSpecifierNULL(C, CE, Format->getFormatIdx());

Is the behavior necessarily implementation-defined for all such functions? I'm pretty sure you can find this attribute on a project-local function that will never have more than one implementation.

clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp
488–489

What is the right way to emit a bug report? I have studied the code base and found several patterns: in some cases, an error node is simply generated, in others a new transition is also added based on the analyzed state. In some cases an expression value is also tracked.

You always need to track the value when reporting things about values, it adds notes to the report explaining why do we think the variable has that value (eg., it was assigned null or compared to null).

Most checkers need this, though for some checkers it's more important than for other checkers.


Whether to generate a non-fatal error node and what state to put into it, this is indeed a very case-by-case decision. In your case the behavior is not undefined, the program doesn't crash when the bug happens, it continues to run normally, so analysis should probably also continue to find more defects. There are a few unknowns about the program's behavior (in particular, we don't know what it actually printed), but we never promised to figure this out in the first place. And even if there are unknowns that matter, well, unknowns are part of life and the rest of the static analyzer is already built under the assumption that not everything is known.

If it was an actual null pointer dereference, then you'd have to terminate the analysis by generating a fatal error node. The user won't be happy to learn about a "bug" that only occurs under the assumption that the program has already crashed with null dereference. And if it doesn't *only* occur under that assumption, then we'll probably find the same bug anyway while exploring a different execution path that doesn't crash.


A separate question is, do we want to assume that the pointer x is non-null after its first use in printf("%p", x)? We may or may not need to assume this regardless of whether we see a path on which it's definitely null. Eg., consider:

void foo(int *x) {
  // (1) We probably can't want to warn here, because there's no indication that 'x' can be null.
  printf("%p\n", x);

  // This is an indication that 'x' can be null.
  //
  // Technically it applies retroactively to the first line, but retroactive warnings are hard
  // because multiple different paths could reach this check, and it's not necessarily there
  // specifically for our current path.
  if (x == NULL) {
    printf("WARNING: null passed to foo()!");
  }

  // (2) Now, do we want to warn here? We probably do.
  //
  // But note that if we reached (2), we've already exhibited the same non-portable behavior
  // at (1) where no warning was emitted.
  printf("%p\n", x);

  // (3) Do we want to warn again here?
  printf("%p\n", x);

  // (4) What if it was a plain null dereference?
  printf("%d\n", *x);
}

If at (1), (2), (3) instead of printf("%p, x) we had a plain null dereference printf("%d\n", *x);, the null dereference checker will not emit any warnings. Indeed, the true-branch of the if-statement is dead code because the only way to reach it would be to go through undefined behavior at (1), but we also can't prove that there's any undefined behavior at (1). Yes we could separately warn about dead code, but that's a job for a different checker.

The null dereference checker accomplishes this by invoking addTransition(nonNullState) regardless of whether the warning is emitted (i.e. whenever notNullState is an actual state), and making the generateErrorNode(nullState) fatal. In particular, after (1) the pointer will continue to be unconditionally non-null.

In the example as written, you probably want a warning at (2). You also really want a null dereference warning at (4). So it's really counter-productive to transition to the notNullState at (1).

You probably don't care all that much about (3). Indeed, if you reach (3) in this analysis, you'll probably also reach it once (2) is addressed. The value of emitting all warnings at once is usually secondary to the nuisance of having too many warnings at once, so we'd probably still prefer it to warn only at (2).

But in order to have a warning at (4), you will also need to allow a warning at (3). This is because what has been assumed, cannot be unassumed. You can suppress the warning at (3) artificially by tracking a set of values or variables about which your checker has already warned, but that'd require a bit of work and I don't think it's necessary. It's probably ok to just warn at (3).

So I think your solution is correct.

517–521

This really doesn't accomplish anything, assume() already does a lot more than that. Just rely on assume(), or make it a completely path-insensitive check if you want to stick to literal constants.

CuriousGeorgiy updated this revision to Diff 548899.EditedAug 10 2023, 12:42 AM

Addressed review comments:

  • moved the checker to a separate UnixAPIPortabilityMinorChecker which can be enabled using optin.portabilityMinor.UnixAPI;
  • added expression value tracking for the pointer passed to the pointer conversion specifier;
  • removed the excessive NULL pointer constant check, which is covered by assume;
  • fixed the FP for 0 integer constants by checking that the argument's type is a pointer type, added a test case for this;
  • updated the expected plist for unix-fns.c correctly.
CuriousGeorgiy added a comment.EditedAug 10 2023, 1:00 AM

@MitalAshok thank you for the feedback!

Instead of checking for hard-coded names, you can check functions with the format(printf, x, y) attribute

@NoQ pointed out this is too generic, and we should go for a case-by-case approach.

You also have to check for pointer types first. This currently warns on printf("%d", 0), since 0 is a null pointer constant

Fixed, thanks for spotting!

Also this only works for null pointer *constants*.

No, it works for symbolic values too, please look at the test cases.

@NoQ thank you for the feedback too!

So if you want to support non-standard functions, and you know that they have the same portability issues, totally go for it!

AFAIC, the functions I listed are very exotic, so I would rather skip them.

Probably a separate check would be better. The consequences of malloc(0) are likely to be much more dire than consequences of printf("%p", 0), so people may want to enable/disable them separately.

I moved it to a separate checker, but I'm not sure about the naming, any suggestions? BTW, I guess it should be printf("%p", NULL), since as @MitalAshok pointed out, we should only consider pointer type values.

The run-lines are mostly self-explanatory. Just run it through the grep command in the other run-line. It filters out all the non-transferable stuff.

Sorry, my bad, I didn't notice this, updated the plist correctly now.

This really doesn't accomplish anything, assume() already does a lot more than that. Just rely on assume(), or make it a completely path-insensitive check if you want to stick to literal constants.

Dropped this check, thanks for point this out!

Looks pretty good.

clang/docs/analyzer/checkers.rst
704

This line should be just as long, as the line above.

705

Our docs aren't great, but we should have a brief description what the checker detects, basically here it would be "Find null pointers being passed to printf-like functions. Usually, passing null pointers to such functions is implementation defined, thus non-portable. Printf-like functions are: x, y, z."
Illustrate one example for diagnosing a case. Also illustrate one case similar to the bad one, that is fixed (basically ensure that the pointer is not null there).
And that's it.

clang/include/clang/StaticAnalyzer/Checkers/Checkers.td
45–50

I don't think we should create a separate package. A separate checker should be enough in alpha.unix, or alpha.optin, or just optin.

1671

Usually, the user-facing checker name which is the <"..."> part there, matches the name of the checker's class name. It helps for maintenance.
I would strongly suggest keeping this naming convention.

1673

Docs are not really optional these days, but we can come back later once the checker is ironed out.

clang/lib/StaticAnalyzer/Checkers/UnixAPIChecker.cpp
88

Usually, we put separate checkers into their own file.

89

Prefer using the check::PreCall callback over the PreStmt.

94

Nowdays, if I'm not mistaken, we just inline initialize these eagerly.
That way you could also avoid marking it mutable.

102

Feel free to just call it reportBug.
Also, in the cpp file, usually we use the namespaces clang,llvm,ento already. You don't need to fully-qualify these names.

593–594

For matching names, we usually use CallDescriptions to do it for us.
If think you could use them while also checking if the Call is a CallEvent::isGlobalCFunction(). I suspect, you only wanna check those.
I think you should define a CallDescriptionMap, because you will also need your specific data_arg_index...
Look for uses of such maps and you will get inspired.

clang/test/Analysis/unix-fns.c
81–92

I'd highly recommend not touching this file to avoid the bulk change in the expected plist output. Please just introduce a new test file.

280–289

I would appreciate a test demonstrating that the pointer argument won't get constrained after the printf call. I think such case was already discussed.

We should also demonstrate that unknown pointers (which doesn't known to be neither null, nor non-null) won't raise issues.

In the checker-logic, we have the data_args_index for each printf-like function.
Can you demonstrate that each is handled as expected?

Also have a test when the formatstring itself is null.

CuriousGeorgiy updated this revision to Diff 554259.EditedAug 29 2023, 4:28 AM
CuriousGeorgiy marked 2 inline comments as done.

Addressed review comments:

  • moved the checker source code to a separate file;
  • moved the checker tests to a separate file;
  • moved the checker to the alpha.unix package;
  • refactored the checker implementation;
  • added additional checker tests;
  • fixed nullptr literals not being warned about.
CuriousGeorgiy retitled this revision from [analyzer] Add check for null pointer passed to %p of printf family to [analyzer] Add check for null pointer passed to the %p of printf family.Aug 29 2023, 4:31 AM
CuriousGeorgiy edited the summary of this revision. (Show Details)

@steakhal

Thanks for the review comments!

This line should be just as long, as the line above.

Fixed.

Our docs aren't great, but we should have a brief description what the checker detects, basically here it would be "Find null pointers being passed to printf-like functions. Usually, passing null pointers to such functions is implementation defined, thus non-portable. Printf-like functions are: x, y, z."
Illustrate one example for diagnosing a case. Also illustrate one case similar to the bad one, that is fixed (basically ensure that the pointer is not null there).

Fixed. Added a generic comment about the checker collection, added a check list and described the printf pointer conversion specifier checker and provided TP and TN examples for it.

I don't think we should create a separate package. A separate checker should be enough in alpha.unix, or alpha.optin, or just optin.

Moved the checker to the alpha.unix package.

Usually, the user-facing checker name which is the <"..."> part there, matches the name of the checker's class name. It helps for maintenance.
I would strongly suggest keeping this naming convention.

Changed the checker name to APIPortabilityMinor, dropping the Unix prefix, which seems redundant as it is part of the checker path.

Usually, we put separate checkers into their own file.

Fixed.

Prefer using the check::PreCall callback over the PreStmt.

Fixed.

Nowdays, if I'm not mistaken, we just inline initialize these eagerly.
That way you could also avoid marking it mutable.

Fixed.

Feel free to just call it reportBug.

I cannot call it this way, since it is bug reporting function for the printf family pointer conversion specifier specific check.

Also, in the cpp file, usually we use the namespaces clang,llvm,ento already. You don't need to fully-qualify these names.

Fixed. Though, AFAIC, I still need to qualify the register and shouldRegister functions with the ento namespace, otherwise the ADL fails.

For matching names, we usually use CallDescriptions to do it for us.
If think you could use them while also checking if the Call is a CallEvent::isGlobalCFunction(). I suspect, you only wanna check those.
I think you should define a CallDescriptionMap, because you will also need your specific data_arg_index...
Look for uses of such maps and you will get inspired.

Thanks a lot for pointing this out, it looks very neat indeed, applied your suggestion, and the checker looks a lot cleaner now.

I'd highly recommend not touching this file to avoid the bulk change in the expected plist output. Please just introduce a new test file.

Fixed.

I would appreciate a test demonstrating that the pointer argument won't get constrained after the printf call. I think such case was already discussed.
We should also demonstrate that unknown pointers (which doesn't known to be neither null, nor non-null) won't raise issues.

Added these test cases to the printf_pointer_conversion_specifier_null_pointer_constraints test.

In the checker-logic, we have the data_args_index for each printf-like function.
Can you demonstrate that each is handled as expected?

AFAIC, I have already done this in the test_printf_pointer_conversion_specifier_null_various_arguments test.

Also have a test when the formatstring itself is null.

AFAIC, passing a null format string is already UB, so the check shouldn't handle this case specially. What test would you like to see?

steakhal requested changes to this revision.Aug 31 2023, 2:47 AM

I'd suggest renaming the checker to alpha.unix.NullFormatSpecifier.
Maybe add tests where multiple format specifiers are null, and also one test where no variadic args are present.

I also tried to simplify your solution a bit, so that we don't need to specify manually which arguments are for the "variadic" part.
Check it out here:

We should think about a shorter diagnostic message because it seems unnecessarily long to me.
The rest looks good to me.

I could do a measurement to see how this would behave in the wild.

clang/lib/StaticAnalyzer/Checkers/UnixAPIPortabilityMinorChecker.cpp
70–71 ↗(On Diff #554259)

Is this an unfinished sentence? I guess you could remove the of suffix.

86 ↗(On Diff #554259)

You always pass this state.

clang/test/Analysis/unix-api-portability-minor.cpp
5–13 ↗(On Diff #554259)
30–35 ↗(On Diff #554259)

This format makes the pattern more apparent.

42–43 ↗(On Diff #554259)

If printf had assumed that pointer1 must be non-null to make the postcondition of the function pass, the dereference would succeed afterward anyway.

To test if printf didn't assume this, just have this afterward to make this observable:

if (pointer1) {
  *pointer1 = 777;
} else {
  // We should have a warning here, as 'printf' didn't assume that the specifier is non-null.
  *pointer1 = 888; // expect-warning {{deref}}
}

If the pointer was constrained, then the else branch should be proven dead-code, and we should not have a warning

This revision now requires changes to proceed.Aug 31 2023, 2:47 AM