This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] ReverseNull: New checker to warn for pointer value conditions, if the pointer value is unconditionally non-null
AbandonedPublic

Authored by Szelethus on Mar 4 2022, 5:42 AM.

Details

Summary

Loosely tied to EXP34-C.

Suppose we do a pointer operation, such as dereferencing, or passing it to a function whose parameter is attributed with non null (like _Nonnull or clang::nonnull). Then, we write an if branch whose condition is this very pointer.

void tp1(int *p) {
  *p = 5;
  // expected-note@-1{{Pointer assumed non-null here}}
  // expected-note@-2{{Consider moving the condition here}}
  if (p)
    // expected-note@-1{{Pointer is unconditionally non-null here}}
    // expected-warning@-2{{Pointer is unconditionally non-null here [alpha.core.ReverseNull]}}
    return;
}

This is a sign of code smell: We either inteded to place the condition before the operation that already expects the pointer to be non-null, or there are some other high level issues in our code. The patch adds a new checker to find and warn for similar cases.

One of the fundamental ideas here is that this is an interprocedural checker. We want to ensure that the derefence (or similar operation) unconditionally leads to the condition without interleaving assignments. This sounds a lot like a dataflow analysis! Indeed, it would be great if we had a dataflow framework already in the bag, and although there is a work on it (a lot of patches can be seen on @ymandel's profile: https://reviews.llvm.org/people/revisions/17749/), its not quite there yet.

Another thought could be, that this checker is (somewhat) redundant with DeadCodeChecker, because this checker also finds dead code, but restricted to pointers. This wasn't a deterrent in developing this alpha checker, because if an 'else' branch is not present, the dead code will not be found (despite the sode smell being there), and restricting ourselves to pointer analysis takes a lot of weight off from our constraint solvers.

I've ran the checker on a lot of projects, and the results so far look promising, and I'm looking forward to publishing them ASAP.

Diff Detail

Event Timeline

Szelethus created this revision.Mar 4 2022, 5:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 5:42 AM
Szelethus requested review of this revision.Mar 4 2022, 5:42 AM

In addition to the excellent summary, I'd like to highlight that this is intended to catch only the cases where the use of the constrained pointer is in the very same stack frame where it was constrained. This leads to a really nice property: local reasoning, which greatly reduces the number of false-positives.

This should be emphasized in the doc segment of the checker as well.

Looks great. I'm loving it!
BTW what is the semantics of [p retain] in ObjC? Can p be null in that context? Or does it count as a dereferences, hence it constraints the pointer?

Why did you touch the AnalysisOrderChecker, should we separate those changes?

Additionally, why do you think it needs to be in the alpha.core package instead of the core?
What sort of false-positives have you seen in the wild justifying that classification?

clang/docs/analyzer/checkers.rst
1703
clang/lib/StaticAnalyzer/Checkers/NullPtrInterferenceChecker.cpp
196–198

I think additionally to this, you should also check for the type of the Condition expression. It's gotta be a pointer.

211–216

IMO we should catch these as well:

void store(int *q, int value) {
  *q = value;
}

void top(int *p) {
  store(p, 5);
  if (!p)
    return;
}

In this case, the stack-frame in which the pointer gets constrained is not the same as where the redundant check resides - rather a child frame of that.

clang/test/Analysis/null-pointer-interference.c
18

"before this expression"?
The term here is not well defined IMO.

xazax.hun added inline comments.Mar 4 2022, 10:30 AM
clang/lib/StaticAnalyzer/Checkers/NullPtrInterferenceChecker.cpp
166

Consider the following code snippet:

void f(int *p, bool b)
{
  if (b) {
    *p = 4;
  }
  if (p) {
   ...
  }
}

I suspect that we would get a warning for the code above. I think warning on the code above might be reasonable (the values of b and p might be correlated but in some cases the analyzer has no way to know this, probably some assertions could make the code clearer in that case).

My problem is with the wording of the error message.
The warning Pointer is unconditionally non-null here on the null check is not true for the code above.

xazax.hun added inline comments.Mar 4 2022, 10:32 AM
clang/lib/StaticAnalyzer/Checkers/NullPtrInterferenceChecker.cpp
166

Also, if the check would warn for the code snippet above, the note "suggest moving the condition here" would also be incorrect.

Szelethus added inline comments.Mar 6 2022, 5:53 AM
clang/lib/StaticAnalyzer/Checkers/NullPtrInterferenceChecker.cpp
166

What if we demand that the the CFGBlock of the dereference must dominate the CFGBlock of the condition point?

xazax.hun added inline comments.Mar 6 2022, 9:42 AM
clang/lib/StaticAnalyzer/Checkers/NullPtrInterferenceChecker.cpp
166

I think it makes sense to warn both when the dereference dominates the null check, and when the null check post-dominates the dereference. We just want to give different error messages in those cases.

NoQ added a comment.Mar 7 2022, 11:03 AM

This check checks must-properties/all-paths properties. This has to be a data flow / CFG-based warning. I don't think there's a way around.

clang/lib/StaticAnalyzer/Checkers/NullPtrInterferenceChecker.cpp
166

What if we demand that the the CFGBlock of the dereference must dominate the CFGBlock of the condition point?

*p = 4;
if (b) {
  p = bar();
}
if (p) {
  ...
}
NoQ added a comment.EditedMar 7 2022, 11:07 AM

BTW what is the semantics of [p retain] in ObjC? Can p be null in that context? Or does it count as a dereferences, hence it constraints the pointer?

In Objective-C "methods" (messages) can be "called on" (sent to) null pointers. The well-defined result is that nothing happens and null is returned. If the method returns an integer, a zero is returned. If the method returns a structure by value, a zero-filled structure is returned. In Objective-C++, if the method returns a C++ object by value, a zero-filled object is returned (without calling the constructor; in particular, the object may be ill-formed from the start).

xazax.hun added inline comments.Mar 7 2022, 11:09 AM
clang/lib/StaticAnalyzer/Checkers/NullPtrInterferenceChecker.cpp
166

Yup, this is a nice example. I cannot think of an easy way around this using symbolic execution.

steakhal added inline comments.Mar 8 2022, 12:24 AM
clang/lib/StaticAnalyzer/Checkers/NullPtrInterferenceChecker.cpp
166

Ah, I see. However, empirically, the checker showed really promising results.

This check checks must-properties/all-paths properties. This has to be a data flow / CFG-based warning. I don't think there's a way around.

Oof. I concede on that. Do you think there is any point in turning this into an optin checker? Or the philosophy should be that this is just way too wild?

NoQ added a comment.Mar 8 2022, 12:28 PM

I guess there's the usual direction that I occasionally suggest: develop a way to verify that all possible paths were explored during symbolic execution (CoreEngine::hasWorkRemaining() on steroids), then do most of the work in checkEndAnalysis.

We also already have alpha.core.TestAfterDivZero that implements a somewhat similar check-after-use strategy.

Do you think there is any point in turning this into an optin checker? Or the philosophy should be that this is just way too wild?

We made such trade-offs in the past, the most obvious one being our strategy to always split paths on every if-statement and never merge. In particular, our core null dereference checker suffers a lot from this, and it's a major point of criticism we regularly receive from our users.

What I'm trying to say here is, I honestly think re-doing it as CFG-based should be relatively easy. We couldn't make any progress at all without those state splits but in this case it's much less of a fundamental issue so I think it's not worth it to hard-commit to a flawed solution from the start.

In fact you already have one of the building blocks:

What if we demand that the the CFGBlock of the dereference must dominate the CFGBlock of the condition point?

I mean, yes, this is a good strategy, you could implement that and then remove symbolic execution. Another building block could be "there aren't any writes into (or escapes of) that local variable between the use and the check". So it looks like this is entirely about statements of certain kind appearing in certain order in the CFG.

Another bonus from CFG-based checkers is that they can be turned into compiler warnings - which may affect a lot more users. I actually don't know why dead stores checker isn't a compiler warning.

I guess there's the usual direction that I occasionally suggest: develop a way to verify that all possible paths were explored during symbolic execution (CoreEngine::hasWorkRemaining() on steroids), then do most of the work in checkEndAnalysis.

I agree that having infrastructure for that would be useful. And I also agree that it might be overkill for this particular warning.

In fact you already have one of the building blocks:

The dataflow analysis framework from google is also starting to shape up quite nicely. It already has a lightweight modeling of the memory that could be used to check if the pointer was changed between two program points.

I actually don't know why dead stores checker isn't a compiler warning.

My guess would be simply no-one submitted a patch doing that.

Szelethus abandoned this revision.Mar 30 2022, 4:43 AM

Very well :) Let's abandon this in its current state, I share this sentiment:

What I'm trying to say here is, I honestly think re-doing it as CFG-based should be relatively easy. We couldn't make any progress at all without those state splits but in this case it's much less of a fundamental issue so I think it's not worth it to hard-commit to a flawed solution from the start.