This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add NoteTag for smart-ptr get()
Needs ReviewPublic

Authored by RedDocMD on Feb 22 2021, 12:27 AM.

Details

Summary

Removed a TODO by adding a note-tag to explain where the
raw pointer came from.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
RedDocMD requested review of this revision.Feb 22 2021, 12:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 22 2021, 12:27 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

@NoQ, @vsavchenko could you please review this?

Looks fine to me.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
496–497

You should emit a sort of grammatically correct diagnostic message even if the region can not be pretty-printed.

RedDocMD updated this revision to Diff 325827.Feb 23 2021, 10:18 AM

Making output more gramatically sensible when pretty printing is unavailable

RedDocMD marked an inline comment as done.Feb 23 2021, 10:19 AM
RedDocMD added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
496–497

You should emit a sort of grammatically correct diagnostic message even if the region can not be pretty-printed.

@steakhal Does it look better now?

steakhal added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
496–497

Can we cover both branches with tests?

RedDocMD marked 2 inline comments as done.Feb 23 2021, 8:01 PM
RedDocMD added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
496–497

I am not sure if this can be done. Because, right now the only Smart Pointer that has been handled is std::unique_ptr. It turns out that it can be pretty printed already, so I don't know how to test the other branch.

NoQ added a comment.Feb 25 2021, 9:07 PM

Nice!

I suspect you're adding too many notes. The note needs to not be there if the *raw* pointer is not tracked. Eg., I suspect that your patch would add a note in the following case in which it shouldn't be there because the raw pointer value doesn't participate in the report despite smart pointer region being interesting:

std::unique_ptr<A> P;
A *a = P.get(); // shound't emit a note here
P->foo();

It's important to not emit redundant notes because users typically take these checker-specific notes as an indication that this information is an essential piece of evidence of the bug in their program. In this example they'd believe that the analyzer has figured out that the smart pointer is null by looking at what happens to the raw pointer value. So they may become very confused if this isn't the case.

RedDocMD marked an inline comment as done.Feb 26 2021, 10:23 AM
In D97183#2589445, @NoQ wrote:

Nice!

I suspect you're adding too many notes. The note needs to not be there if the *raw* pointer is not tracked. Eg., I suspect that your patch would add a note in the following case in which it shouldn't be there because the raw pointer value doesn't participate in the report despite smart pointer region being interesting:

std::unique_ptr<A> P;
A *a = P.get(); // shound't emit a note here
P->foo();

It's important to not emit redundant notes because users typically take these checker-specific notes as an indication that this information is an essential piece of evidence of the bug in their program. In this example they'd believe that the analyzer has figured out that the smart pointer is null by looking at what happens to the raw pointer value. So they may become very confused if this isn't the case.

@NoQ, in the example you have given, isn't the smart-pointer P null? So shouldn't a warning be emitted for de-referencing it? Or is it that since a is not being used, a warning shouldn't be emitted? Sorry, I could not quite understand that point ...

NoQ added a comment.Feb 26 2021, 11:23 AM

The warning should be emitted but it shouldn't have a note at P.get() telling the user that an inner pointer was obtained.

@NoQ, I guess I would need to figure out a way to find out if the raw pointer obtained from get() is being used somewhere or is being constrained. I am trying to first figure out whether the raw pointer is being constrained to null, causing a null-deref to detected.

RedDocMD updated this revision to Diff 327396.Mar 2 2021, 2:44 AM

Should not emit note if raw pointer cannot be tracked

If the inner pointer participates in a branch condition guarding the dereference, that memory region is gotta be important, right? So, we should mark it so.
A BugreportVisitor could easily transfer the information about the fact that the dereference was guarded by that particular branch condition - and marking the InnerPointerVal (MemRegion) important.

This way the NoteTag for the get() could emit the warning.

The other approach, you @RedDocMD proposed about checking the constraint for the inner pointer, seems somewhat odd to me.
It could work, but I think the visitor is cleaner.

If the inner pointer participates in a branch condition guarding the dereference, that memory region is gotta be important, right? So, we should mark it so.
A BugreportVisitor could easily transfer the information about the fact that the dereference was guarded by that particular branch condition - and marking the InnerPointerVal (MemRegion) important.

This way the NoteTag for the get() could emit the warning.

The other approach, you @RedDocMD proposed about checking the constraint for the inner pointer, seems somewhat odd to me.
It could work, but I think the visitor is cleaner.

TBH, I don't like my approach either. I feel that it leaves out some cases.
The InnerPointerVal memory region is not marked as interesting as of now, I have tried that out. The branch condition constraint is set by the ConstraintManager and it is queried via in the State in the method smartptr::isNullSmartPtr at SmartPtrModelling.cpp:104. I have to see if the ConstraintManager can mark the memory region as important. @steakhal, @NoQ what do you think?

The InnerPointerVal memory region is not marked as interesting as of now, I have tried that out. The branch condition constraint is set by the ConstraintManager and it is queried via in the State in the method smartptr::isNullSmartPtr at SmartPtrModelling.cpp:104. I have to see if the ConstraintManager can mark the memory region as important.

Hm, I don't think you can make this work.
The deref bug is reported only if the smartptr::isNullSmartPtr(State, ThisRegion) is true. Which is only true if the InnerPointVal is known to be null. So the information on how we get to know that the smart pointer is null is already lost.
From this perspective, I don't think you have any other choice than to walk back from the bug to the root using a bugreport visitor - and check whether or not the inner pointer is used in a branch condition.
I might be wrong about this, since this was the first time I had a deeper look at the SmartPtrChecker.

The TaintBugVisitor could give you a hint on how to implement this.

NoQ added a comment.Mar 2 2021, 4:07 PM

In some cases BugReport.isInteresting(InnerPointerVal.getAsSymbol()) would yield us exactly what we want. But if there's no symbol we have no choice but to interact with the trackExpressionValue facility and this may turn out to be pretty challenging.

We could, for instance, teach it to mark exploded nodes as interesting when it changes tracking mode. That'd be a completely new form of interestingness for us to track. Or maybe pairs (exploded node, expression) so that to be more specific. Then we could query it from our note tag.

Sorry, was a bit caught up with assignments. I will try to come up with a better implementation with the advice given by @NoQ and @steakhal.

In D97183#2598806, @NoQ wrote:

We could, for instance, teach it to mark exploded nodes as interesting when it changes tracking mode.

@NoQ, what is tracking mode for an ExplodedNode?

I am trying to use scan-build on a file to see what sort of errors are reported:
./llvm-project/release/bin/scan-build -o . -enable-checker alpha.cplusplus.SmartPtr -analyzer-config alpha.cplusplus.SmartPtrModelling:ModelSmartPtrDereference=true clang++ -c uniq_deref.cpp.
@NoQ, @vsavchenko why does this not work?

NoQ added a comment.Mar 9 2021, 2:05 PM

why does this not work?

How does this not work? What does it say?

what is tracking mode for an ExplodedNode?

Mmm yeah, in hindsight i should have explained it much more.

First of all, we have "bug visitors". They are the entity that adds notes to bug reports. Note tags are a new way to add notes to the bug report but it's still the same visitors under the hood, i.e. there's visitor that scans note tags and invokes their callback to produce notes. These visitors scan the report from bottom to top. They typically emit note whenever something changes in program state. For instance, if an interesting pointer symbol in the program state changes its status from "allocated" to "release" (i.e., the visitor was so far only seeing nodes in which it was released but now it encounters the first node in which it's not released yet) then it adds a note "pointer was released" which is relevant for use-after-free warnings.

Now, sometimes we have to emit notes with respect to values that aren't symbols. For instance, in null dereference bugs we have to explain the movement of a value that's a plain and simple null pointer. Unlike symbols who each have unique identity that captures their backstory and properties, such "concrete" values are indistinguishable from each other. If we see a null in the older state and a null in a newer state, we can't tell if it's the same null or a different null. This makes it much harder to explain the journey a specific null value has undertaken in order to end up in our pointer that we've ended up dereferencing.

This is where trackExpressionValue comes in. It knows how to track concrete values such as Null or Undefined. The way it works is that it tracks something in the state that corresponds to that null value but does have an identity, typically either memory regions ("the null pointer is currently stored in this variable") or expressions ("the null pointer is currently being returned from this call expression"). Neither memory regions nor (especially!) expressions are guaranteed to stay the same throughout the entire journey so we have to skip from one to the other in order to keep tracking our null pointer (say, "the null pointer was returned from a call expression which acts as an initializer to a variable; now we can stop tracking the variable and start tracking the call expression"). This is what I referred to as changing modes. It's also very clear from the static analyzer code where the mode changes: namely, trackExpressionValue is not a single visitor but a combination of visitors that recursively attach new instances of themselves to the report as they ascend it. For instance, in the above example a visitor that tracks a variable would finish and attach a new visitor that tracks a call expression.

So basically i suspect that the act of reattaching the visitor could be documented through interestingness for your checker to pick up. That would allow you to query whether the call-expression P.get() returns an interesting null value as opposed to a dull, ordinary null value that's unrelated to the report.

In D97183#2615096, @NoQ wrote:

why does this not work?

How does this not work? What does it say?

Sorry, my bad! I had made a typo.

@NoQ, looking through the source code of trackExpressionValue I can see that it adds many visitors to the BugReport passed to it. That I believe is the recursive attachment of visitors you described above.
So, as far as I understood, I have to make changes in this function to mark an ExplodedNode as interesting when it changes tracking mode. This change is marked by when recursively a new visitor is attached (or at least in some of those places, the exact places will have to be figured out).
Then this can be queried from the checker to obtain the information that is needed.
Am I thinking on the right track?

NoQ added a comment.Mar 10 2021, 11:31 AM

I can see that it adds many visitors to the BugReport passed to it.

Yes and some of these visitors will call trackExpressionValue() again in their Visit...() functions which corresponds to adding visitors in the middle of visitation which is arguably the most interesting part.

RedDocMD updated this revision to Diff 329989.Mar 11 2021, 9:25 AM

Calling trackExpressionValue to mark InnerPointerVal as interesting

@NoQ, I am using trackExpressionValue to add intersetingness to the InnerPointerVal. That solves the original problem. However, it is causing the MoveChecker to add extra warnings to use-after-move cases. Essentially, when a unique_ptr is moved and subsequently used, it triggers two warnings - one from SmartPointerModelling and another from MoveChecker. It seems to me that two separate checkers are tracking the same bug - use after move.
So should I make another patch to modify SmartPointerModelling to not emit warnings on use after move (instead just leaving the GDM updating code)? Or is there a better solution to this?

RedDocMD updated this revision to Diff 330422.Mar 12 2021, 9:54 PM

Removed an embarassingly dumb mistake

@NoQ, sorry for the absurdly dumb mistake. Not entirely sure what I was thinking.
Can you please have a look at it now?

@NoQ, could you please have a look at this?

@NoQ, could you please have a look at this?

NoQ added a comment.Mar 17 2021, 9:10 PM

By tracking the call-expression you're basically tracking the raw pointer value because that's what operators * and -> return. Of course operator * returns an lvalue reference rather than a pointer but we don't make a difference when it comes to SVal representation.

So you're saying that simply by always tracking the (final) raw pointer value and checking whether the raw value is interesting upon .get() you dodge the communication problem entirely. I think this is quite a statement! I'd like a stronger evidence for that than passing a couple of tests. Does the following test work?:

void test(std::unique_ptr<A> P) {
  A *a = P.get(); // unlike your positive test this doesn't deserve a note
                  // because we weren't looking at 'a' when we concluded
                  // that the pointer is null
  if (!P) {
    P->foo();
  }
}

Essentially, when a unique_ptr is moved and subsequently used, it triggers two warnings - one from SmartPointerModelling and another from MoveChecker

Do i understand correctly that this doesn't happen anymore when you stopped creating a new node?

RedDocMD updated this revision to Diff 331461.Mar 17 2021, 10:09 PM

Added a negative test

Does the following test work?

@NoQ, it seems to be working.

So you're saying that simply by always tracking the (final) raw pointer value and checking whether the raw value is interesting upon .get() you dodge the communication problem entirely

I would not say it has been dodged, but rather that problem had already been solved by trackExpressionValue. At line 1949 of BugReporterVisitors.cpp (inside the trackExpressionValue function) is:

if (LVState->getAnalysisManager().getAnalyzerOptions().ShouldTrackConditions)
    report.addVisitor(std::make_unique<TrackControlDependencyCondBRVisitor>(
          InputNode));

Approximately, TrackControlDependencyCondBRVisitor is a visitor that looks into condition statements and via mutual recursion with trackExpressionValue marks SVal's as interesting if they are used in a condition and that condition constrains the Expr on which the visitor was originally called on. This gave me the idea that calling trackExpressionValue is all that we really need to do, since it already contains a visitor to discover the interestingness we need. Looking into this function made me feel that trackExpressionValue is actually a very powerful function which solves a lot of these communication problems.

Do i understand correctly that this doesn't happen anymore when you stopped creating a new node?

Yes, and I found out my blunder after staring at the exploded graph dump. Creating a new node was un-necessary since trackExpressionValue needs a node corresponding to the expression where we find the bug, and that was already being created above.

RedDocMD updated this revision to Diff 331469.Mar 17 2021, 10:52 PM

Added some more positive tests

I did not follow the discussion closely but we (CodeChecker team) might have a similar problem.
Consider this: https://godbolt.org/z/835P38

int do_bifurcation(int p) { return p < 0; }

int b(int x, int y) {
  int tmp = 13 / y;  // y can't be 0.
  (void)tmp;

  int p0 = do_bifurcation(x);  // There is a path where p0 is 0.

  int div = p0 * y; // So, div also becomes 0 on that path.
  return 1 / div;
}

However, the bugreport tells us that you do a division by zero, which was initialized a line above.

Do you think it is a related issue @NoQ?

NoQ added a comment.Mar 20 2021, 2:02 PM

No-no, TrackControlDependencyCondBRVisitor's purpose is completely different. It tracks symbols that didn't necessarily participate in the report but participated in conditional statements that lexically surround the report. It's used for explaining how did we got ourselves into a given lexical spot but it doesn't explain why is this a problem.

We can get it out of the way:

A *a = P.get(); // no note expected
if (!P) {}
P->foo();

vs.

A *a = P.get(); // expected note
if (!a) {}
P->foo();

I suspect that it may still work and the reason this works is because we're not collapsing .get()'s return value to null when it's constrained to null. Given that the only interesting thing that could happen to the return value of .get() is getting constrained to null (because it's an rvalue, the programmer can't use it to overwrite the raw pointer value inside the pointer), it's either already null or you'd see the constraint tracked once you mark the symbol as interesting.

The reason i'd still not like this solution is because collapsing the symbol to null on .get() (if it's already constrained to null) is arguably the preferred behavior as it makes constraint solver's life easier. But that'd most likely break your tracking solution.

Do you think it is a related issue @NoQ?

I don't see any smart pointers or null dereferences so... no? But there's definitely an issue with tracking in your example. It's definitely correct that div is initialized with zero on the path on which p0 is zero but the events that led to p0 being zero are not explained which is a bug that needs to be fixed.

clang/test/Analysis/smart-ptr-text-output.cpp
315–386

Looks like your git history is acting up. Your patch adds this test right? Are there more proposed changes in the cpp files that aren't currently highlighted for a similar reason?

I'll try to play with your patch locally once this is fixed ^.^

RedDocMD updated this revision to Diff 332147.Mar 20 2021, 11:34 PM

Fixed up the git history

RedDocMD marked an inline comment as done.Mar 20 2021, 11:35 PM
RedDocMD added inline comments.
clang/test/Analysis/smart-ptr-text-output.cpp
315–386

Yeah I seem to have tripped over the single commit rule. It should be fixed now.

RedDocMD updated this revision to Diff 332148.Mar 21 2021, 1:52 AM
RedDocMD marked an inline comment as done.

Re-formatted file

RedDocMD updated this revision to Diff 332193.Mar 21 2021, 9:58 PM

Added some more tests

RedDocMD added a comment.EditedMar 21 2021, 10:59 PM

@NoQ, why does the following trigger a null-dereference warning? (https://godbolt.org/z/Kxox8qd16)

void g(std::unique_ptr<A> a) {
  A *aptr = a.get();
  if (!aptr) {}
  a->foo();
}

When a->foo() is called, the constraint !aptr is no longer valid and so InnerPointerVal corresponding to a is no longer constrained to be null.
Am I missing something?

RedDocMD updated this revision to Diff 332538.Mar 22 2021, 11:26 PM

Changed approach to use visitor

RedDocMD updated this revision to Diff 332540.Mar 22 2021, 11:29 PM

Repaired git history

RedDocMD updated this revision to Diff 332542.Mar 22 2021, 11:33 PM

Removed extra includes

@NoQ, I have taken a different approach this time. I have used a visitor and am storing some more data in the GDM. Together, they distinguish between the following three cases:

  1. If the raw pointer obtained from get() is constrained to null in a path which leads to a Node (and thus State) where a smart-pointer-null-deref bug occurs.
  2. If the raw pointer was null to begin with (because the smart-pointer was null)
  3. If the raw pointer was not null to begin with but the smart-ptr became null after that.

Only in the first case should the note be emitted. I have added some more tests to that effect.
Can you please have a look at this?

@NoQ, why does the following trigger a null-dereference warning? (https://godbolt.org/z/Kxox8qd16)

void g(std::unique_ptr<A> a) {
  A *aptr = a.get();
  if (!aptr) {}
  a->foo();
}

When a->foo() is called, the constraint !aptr is no longer valid and so InnerPointerVal corresponding to a is no longer constrained to be null.
Am I missing something?

When the if's condition is evaluated, it probably triggered a state split. On one path the aptr (aka. the inner pointer) will be constrained to null.
The only way to be sure is by checking the exploded graph and see where it goes.


@NoQ, I have taken a different approach this time. I have used a visitor and am storing some more data in the GDM. Together, they distinguish between the following three cases:

  1. If the raw pointer obtained from get() is constrained to null in a path which leads to a Node (and thus State) where a smart-pointer-null-deref bug occurs.
  2. If the raw pointer was null to begin with (because the smart-pointer was null)
  3. If the raw pointer was not null to begin with but the smart-ptr became null after that.

Only in the first case should the note be emitted. I have added some more tests to that effect.

All in all, I see where it's going. I don't know, it might be the right thing to do. I haven't spent much time on this topic though.
See my inline comments.

clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
267

I'm not sure if we should expect 16 unique places where uptr::get() called on a path. I would guess 4 or 2 is more than enough.

286–289

So you are trying to find the assignment, where the inner pointer is assigned to a variable.
This visitor logic seems to be somewhat convoluted.

What you want to achieve is slightly similar to FindLastStoreBRVisitor. You should have a look at that.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
479

Nit: Declare the variable as close to the usage as you can. In the narrowest scope as well.

494–495

Why don't you 'save' the MemRegion only if the inner pointer is not proven to be null. This would relieve you from checking it later.

Nit: I don't like such if branches. The last statement is identical, which is a code smell.
It's better to think of this as a function taking stuff and producing a State.
An immediately called lambda expression would enclose any local variables and it would suggest that the algorithm it implements is self-contained.
I know that I'm the only immediately called lambda expression fan though.

RedDocMD marked 2 inline comments as done.Mar 25 2021, 5:16 AM
RedDocMD added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
267

Ok

286–289

That is what I had done before. @NoQ pointed out why this wouldn't work in a previous comment.

steakhal added inline comments.Mar 25 2021, 5:24 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
286–289

Please elaborate on that.
I'm not saying that an already existing visitor would perfectly fit your needs. I'm just curious why a similar logic would not work for you. You are trying to iterate over a bunch of decls and init exprs etc. And there is nothing similar to the visitor I mentioned.

RedDocMD marked 3 inline comments as done.Mar 25 2021, 6:09 AM
RedDocMD added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
286–289

Sorry. I should have written it out better.

So you are trying to find the assignment, where the inner pointer is assigned to a variable.

Yes and no. I am indeed trying to find where the first assignment occurred, since re-assigning to the pointer obtained from get() doesn't give any information regarding regarding the smart pointer being null or not. So what this visitor does that it goes back to the original assignment to find out what SVal was bound to a.get(). The Environment doesn't have this info since it is garbage collected somewhere on the way. Also accessing this State allows me to check whether the SVal was null to begin with.
I don't think FindLastStoreBRVisitor does this.

RedDocMD marked an inline comment as done.Mar 26 2021, 12:11 PM

@NoQ, what do you think?

NoQ added a comment.Mar 29 2021, 11:35 PM

@NoQ, why does the following trigger a null-dereference warning? (https://godbolt.org/z/Kxox8qd16)

void g(std::unique_ptr<A> a) {
  A *aptr = a.get();
  if (!aptr) {}
  a->foo();
}

When a->foo() is called, the constraint !aptr is no longer valid and so InnerPointerVal corresponding to a is no longer constrained to be null.
Am I missing something?

When the if's condition is evaluated, it probably triggered a state split. On one path the aptr (aka. the inner pointer) will be constrained to null.
The only way to be sure is by checking the exploded graph and see where it goes.

Yes, that's, like, the whole point. We report unchecked use after a check. If the pointer is never null, why check? If the pointer is sometimes null, why use without a check? The code clearly doesn't make sense. That's what the report says: "assuming 'aptr' is null, there's null dereference on the next line". Once our simulation leaves the lexical scope of the if-condition 'aptr' doesn't suddenly become non-null. Technically speaking, we have no notion of a merge at all. There is literally no merge operation defined on our ProgramState, we do not perform fixpoint iteration, we do not try to combat path explosion, we simply simulate the behavior of the program on a few possible execution paths (defined by branching in the program) and report potential issues.

clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
286–289

I think it's too late to act when .get() is already happening. Like, we're visiting from bottom to top, in reverse to how our normal abstract interpretation goes. So if we only start tracking when we reach .get() we won't be able to explain what happens to the pointer between obtaining it from .get() and constraining it to null. For instance, if it was moved from one raw pointer variable to another, we won't put a note there, but we should.

Let's try the following. Write a visitor to detect the moment of time when the raw pointer value is getting constrained to null. I.e., find when State->assume(State->get<TrackedRegionMap>(ThisRegion)) stops working. It can be for two reasons: either the inner value is overwritten or it's constrained to null. If it's overwritten, track the newly set value and our job is done. If it's constrained to null, try to find out what's happening (is it an if-statement? is it an eagerly-assume action over a comparison operator?). We already have a common visitor that's good at figuring this out, maybe it'd be possible to reuse the code. In any case, start tracking the symbol and possibly emit a checker-specific note immediately ("raw pointer value constrained to null" or something like that).

Right, sorry for the late reply, @NoQ.
I will get to it once I get these assignments off my head.

For the following function:

void foo(std::unique_ptr<A> P) {
  A* praw = P.get();
  A* other = praw;
  if (other) {}
  P->foo();
}

Where do we expect a note? Where praw is initialized, where other is initialized or both?

For the following function:

void foo(std::unique_ptr<A> P) {
  A* praw = P.get();
  A* other = praw;
  if (other) {}
  P->foo();
}

Where do we expect a note? Where praw is initialized, where other is initialized or both?

I would expect no notes at all, since there is no bug.

For the following function:

void foo(std::unique_ptr<A> P) {
  A* praw = P.get();
  A* other = praw;
  if (other) {}
  P->foo();
}

Where do we expect a note? Where praw is initialized, where other is initialized or both?

I would expect no notes at all, since there is no bug.

According to the existing analyzer logic, there is a bug. If you check other for null, we can conclude that there are circumstances when it is null indeed.

For the following function:

void foo(std::unique_ptr<A> P) {
  A* praw = P.get();
  A* other = praw;
  if (other) {}
  P->foo();
}

Where do we expect a note? Where praw is initialized, where other is initialized or both?

I would expect no notes at all, since there is no bug.

According to the existing analyzer logic, there is a bug. If you check other for null, we can conclude that there are circumstances when it is null indeed.

I think we can conclude that P must be non-null (since it was unconditionally dereferenced), thus the previous check on the inner pointer and the branch it guards must be dead! This fact deserves a report, you are right. My bad.

In this case, the report should show how the inner pointer got bound to the other. Thus, we should highlight both assignments.

NoQ added a comment.Apr 20 2021, 10:31 AM

I think we can conclude that P must be non-null (since it was unconditionally dereferenced), thus the previous check on the inner pointer and the branch it guards must be dead!

Under the same logic we also can't report null dereference in the following code:

void bar() {
  A *p = nullptr;
  p->foo();
}

Indeed, the null pointer p is unconditionally dereferenced, therefore the entire function bar() must be dead!

Or maybe the entire executable binary into which this code is linked is never run. Some users definitely complain about static analyzer analyzing code that was entirely dead from the start, suggested integrating with the dynamic PGO facilities to analyze hot code first.

It's important to realize that with pure static analysis it is absolutely impossible to reliably report a bug more severe than dead code. Any form of static analysis only ever finds code that doesn't make sense. It cannot make assumptions about how often the code is executed in practice or how severe and impactful the bug is to the users of the program under analysis. When we report anything that doesn't directly scream "dead code", like null dereference, we're still always implicitly saying "This code doesn't make sense because it either has dead parts or _____". In fact we should probably do a better job at managing expectations because users do become upset when we promise them use-after-frees but in reality only find dead code that "would have caused use-after-frees if it was ever run".

It's important to realize that with pure static analysis it is absolutely impossible to reliably report a bug more severe than dead code. Any form of static analysis only ever finds code that doesn't make sense. It cannot make assumptions about how often the code is executed in practice or how severe and impactful the bug is to the users of the program under analysis. When we report anything that doesn't directly scream "dead code", like null dereference, we're still always implicitly saying "This code doesn't make sense because it either has dead parts or _____". In fact we should probably do a better job at managing expectations because users do become upset when we promise them use-after-frees but in reality only find dead code that "would have caused use-after-frees if it was ever run".

Tbh, given how loose of a memory model we are dealing with (at its worst, it is the C memory model), I think the static-analyzer does a great job at detecting what it possibly can. As for the user's expectation, I think we just need to wait for more adoption of the static analyzer. Then users will know exactly what to expect (we do not yell at the C++ compiler for not preventing use after delete, we do not yell at the Rust compiler for not allowing mismatching lifetimes - we know what to expect, and just work with that).

RedDocMD updated this revision to Diff 338933.Apr 20 2021, 10:56 AM

Changed to a different visitor

@NoQ, I have changed to the visitor that you suggested.

@NoQ?
(I actually should remove some extra includes and extra member fields)

NoQ added inline comments.May 3 2021, 1:55 PM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
183–185

SVal ExprVal = State->getSVal(Sub, Node->getLocationContext());.

203–207

Your visitor doesn't need to track raw pointers through raw variables. trackExpressionValue() is fully capable of doing this. The problem only becomes checker-specific when the connection between smart pointers and raw pointers is involved. I meant assignments between two smart pointers.

Maybe even reduce the visitor to only the if-statement case and have it check all interesting smart pointers instead of a specific smart pointer. Propagation of interestingness across smart pointers can be handled by note tags for move/copy assignmnets and constructors.

RedDocMD updated this revision to Diff 346632.May 19 2021, 10:28 PM

Added a test for multiple get's

RedDocMD updated this revision to Diff 346633.May 19 2021, 10:33 PM

Removed unnecessary includes

vsavchenko added inline comments.May 20 2021, 1:52 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
37

nit: class name should be a noun (functions and methods are verbs)

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
86

Probably forgotten

149–152

We probably should have a checker in clang-tidy (maybe we already do), for situations like this.
C++ has a long-lasting tradition that set's/map's insert return a pair: iterator + bool. The second part tells the user if the insert operation actually added the new element or it was already there.
This allows us to do 1 search instead of two.

154–176

Is there any reason it's not a method of FindWhereConstrained?

178

After that you have 3 distinct cases to handle. It's a good opportunity for extracting them into separate functions.

192–193

I think it's better to IgnoreParensAndCasts instead of manual traversal.

229

Variables are capitalized.

230

It is a widespread pattern in LLVM to declare such variables directly in if statements:

if (auto Report = bugReportOnGet(RHS))
  return Report;
240

So, situations like int *a = nullptr, *b = smart.get(); are not supported?

242

llvm::find_if

251–258

This level of nestedness is frowned upon. It is a good tell that the function should be refactored.
The following code:

if (cond1) {
  . . .
  if (cond2) {
    . . .
    if (cond3) {
      . . .
    }
  }
}
return nullptr;

can be refactored into:

if (!cond1)
  return nullptr;

. . .
if (!cond2)
  return nullptr;

. . .
if (!cond3)
  return nullptr;

. . .

It is easier to follow the logic if the function is composed in this manner because from the very beginning you know that else with more stuff is not going to follow.

RedDocMD updated this revision to Diff 346808.May 20 2021, 11:34 AM
RedDocMD marked 10 inline comments as done.

Code clean up

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
154–176

Not really.

192–193

What is IgnoreParensAndCasts`? I didn't find it in the source code anywhere (ripgrep, that is).

240

No it works even in that case (I have added a test for that). It's got to do with how the AST data structures are (int *a = nullptr, *b = smart.get(); is considered a single decl).

242

Not sure if that'll work neatly since I actually need the return value of the predicate function (the report).

251–258

Do you still think that's the case now? (After breaking it into functions).
I also think that for the sort of pattern matching we are doing in this patch, the nested if's make more sense.

RedDocMD updated this revision to Diff 346817.May 20 2021, 11:59 AM

Removed un-necessary includes

teemperor added inline comments.May 21 2021, 1:04 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
192–193

Just a typo, the actual name is IgnoreParenCasts (Expr::IgnoreParenCasts)

255

LLVM-code style mandates no curly braces around single-line ifs.

268

(I think this was already pointed out, but early-exits are the way to go in LLVM.

const auto *DS = llvm::dyn_cast<DeclStmt>(S));
if (!DS)
  return nullptr;
const Decl *D = DS->getSingleDecl();
const auto *VD = llvm::dyn_cast<VarDecl>(D);
if (!VD)
  return nullptr;
....

``

Right, @teemperor, I will do the refactors once we figure out how to utilize trackExpressionValue() here (because that will eliminate quite a bit of code from GetNoteVisitor, so no point in refactoring those bits).

RedDocMD updated this revision to Diff 347259.May 23 2021, 11:52 AM

Refactors to make code more stylistically accurate

RedDocMD updated this revision to Diff 347260.May 23 2021, 11:53 AM

Removed unnecessary include

I have put in the refactors, @teemperor.

vsavchenko added inline comments.May 26 2021, 1:54 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
38

Sorry for picking on it again, but is it visiting "get notes"? Otherwise, it is just a placeholder name that doesn't tell the reader what this class is actually for.
Also, is there a reason why it is declared in the header file and couldn't be a part of SmartPtrChecker.cpp instead?

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
79

I think it's better to use REGISTER_SET_FACTORY_WITH_PROGRAMSTATE instead and keep factory as part of the state as well.

169

Functions are actions, it is better to express actions with verbs.
Additionally, I don't think that it really reflects what it does without the verb either.

175–177

This type of predicates shouldn't be scattered throughout the code here and there. It should be definitely unified and put into a function that is shared with the checker and other parts of the model.
One simple example here - what if we need to support other types of smart pointers? Should we go around all over the code looking for the places like this or fix it in one place?
It also naturally creates a question "What about shared_ptr"?

192

Just wondering if return {}; will be sufficient here.

200–203

I think it's better to unite these two into if (!E || E->getCastKind()...)

206–208

I guess it escaped during code refactoring:
SVal ExprVal = State->getSVal(Sub, Node->getLocationContext());

224–227

Similar note here: if (!BO || BO->getOpcode()...)

230–233

And here

499–500

I generally don't like repeating code in both branches of the if statement.
Here they share the following logic: add CallExpr and update the state.
We can easily update the code to share the code:

const auto *ExistingSet = State->get<ExprsFromGet>(ThisRegion);
auto BaseSet = ExistingSet ? *ExistingSet : StmtSetFactory.getEmptySet();
auto NewStmtSet = StmtSetFactory.add(BaseSet, CallExpr);
State = State->set<ExprsFromGet>(ThisRegion, NewStmtSet);
RedDocMD updated this revision to Diff 348165.May 26 2021, 10:41 PM
RedDocMD marked 10 inline comments as done.

More refactoring

RedDocMD updated this revision to Diff 348166.May 26 2021, 10:47 PM

Removed extra include

vsavchenko added inline comments.May 27 2021, 1:32 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
37

It is again some sort of verb. I don't know how to read it except for "emit note on get"-visitor. What about something like GetNoteEmitter? I mean do we really need to put Visitor in the name?

38

This comment is marked as "Done", but there is no code change nor justification.

57

IMO, these three visit functions are a bit confusing in their names. I guess, my expectation for a group of methods that have very similar names is that they have similar signatures. And here visitIfStmt doesn't return diagnostic piece as opposed to two other methods.
We are not bound here to name everything visitABC since these methods are not part of the visitor interface.
My suggestion here is to actually keep visitIfStmt because that's what you actually do here, you simply visit it. Maybe change the return type to bool (signifying that it was indeed an if statement), so we don't check for other possibilities.
As for other two functions, I'd prefer something like emitNoteForAssignment and emitNoteForInitialization to keep the naming pattern going.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
79

It is marked as "Done", but the code is same.

111

My guess is that it should be just isStdSmartPtr

171

I'm still here picking on names 😅 It doesn't emit "bug report" it emits "note".
And actually if you name your class as GetNoteEmitter or something similar you don't need to mention get here again.

198–202

I have two major questions about this implementation:

  • Why don't we need an actual check for IfStmt? Won't it trigger on bool unused = !pointer;? And if so it won't mean constrained.
  • Why do we only care about implicit pointer-to-bool conversion? What about situations like pointer == nullptr, NULL != pointer, __builtin_expect(pointer, 0), etc?
230

It's better to use IgnoreParenCasts here as well.

RedDocMD added inline comments.May 27 2021, 1:34 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
38

Yes it is the line const SVal *InnerPtr = State->get<TrackedRegionMap>(ThisRegion); in the VisitNode function. This uses the TrackedRegionMap which is defined in SmartPtrModelling.cpp but the visitor must be added to the bug-report in SmartPtrChecker.cpp.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
79

My bad, I should have put in the TODO.

192

Yup

499–500

Right, thanks :)

My bad! I forgot to submit the replies.

RedDocMD marked 9 inline comments as done.May 27 2021, 1:42 AM
RedDocMD added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
37

I put in Visitor because all the visitors that I have come across end with *Visitor*. But then again, I am sure that if we look, we'll find counter-examples. So your call here ...

RedDocMD marked an inline comment as done.May 27 2021, 1:43 AM
vsavchenko added inline comments.May 27 2021, 1:49 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
38

I see. IMO it still means that the visitor belongs with the checker and was put into this header as a workaround.
So maybe instead we can add getInnerPointer(const ProgramStateRef State, const MemRegion *ThisRegion) since we already have a very similar isNullSmartPtr in this header.

vsavchenko added inline comments.May 27 2021, 1:57 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
37

That's true, but it doesn't mean that we shouldn't care about the names and how they read. You cannot say that you visit "emit note on get", so the name doesn't help to understand what this class does. If you want to name it GetVisitor - no problem because it does visit gets. But if you want to state in the name that it emits notes on gets, the name should say Emitter, and that second name is more specific.

NoQ added a comment.May 27 2021, 11:12 AM

A brief summary of an offline discussion we recently had.

(1) Basically we figured out that it's still necessary to do something like I originally suggested:

In D97183#2598806, @NoQ wrote:

We could, for instance, teach it to mark exploded nodes as interesting when it changes tracking mode. That'd be a completely new form of interestingness for us to track. Or maybe pairs (exploded node, expression) so that to be more specific. Then we could query it from our note tag.

This is necessary because the symbol produced by .get() is not immediately collapsed to a constant and it remains interesting as a symbol for the entire duration of the new visitor's lifetime, but there may be unrelated .get()s on the same smart pointer during said lifetime that don't deserve a note despite producing the same symbol.

(2) We also came up with a different approach to communicating with trackExpressionValue(). First of all, we probably don't need to mark all nodes/expressions on which trackExpressionValue() switches modes as interesting; we're only interested in the spot where tracking ends. This happens because the checker fully models .get() and therefore it's impossible for a generic solution like trackExpressionValue() to proceed with tracking as that would have required checker-specific machinery. We could reduce the scope of proposal (1) by only marking the last node as interesting but I have a better idea: let's add a callback to trackExpressionValue() that's invoked once tracking ends. In our case such callback would attach a checker-specific visitor to the smart pointer which solves our problem perfectly.

Such callback could be useful in a lot more cases though, because it provides us with an extremely generic benefit of knowing the origin of the value. We already demand such knowledge in a number of other machines that are currently hard-coupled to trackExpressionValue(): namely, i'm talking about inlined defensive check suppressions. Both of these suppressions basically say "if a null/zero value originates from a nested function call that was exited before the bug node, suppress the warning". These suppressions don't care where the value was passing through, they only care where it originated from. As such, by providing a callback for the origin of the value, we could decouple these suppressions and possibly even move them into the respective checkers (eg., the null dereference checker). I think this could be an excellent refactoring pass.

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
61–64

Typo!

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
258

That's not what the assert is saying; the assert is saying that the DeclStmt has exactly one Decl. It basically forbids code like

int x = 1, y = 2;

. You may wonder why don't you crash all over the place. That's because Clang CFG creates its own DeclStmts that aren't normally present in the AST, that always have exactly one declaration. This is necessary because there may be non-trivial control flow between these declarations (due to, say, presence of operator ?: in the initializer) so they have to be represented as different elements (possibly even in different blocks) in the CFG.

RedDocMD added inline comments.May 27 2021, 11:22 AM
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
258

So I guess the tests at lines 317 and 378 of smart-ptr-text-output.cpp work because of the CFG messing with the AST? So should I remove the assert?

RedDocMD marked 7 inline comments as done.Jun 2 2021, 11:15 AM
RedDocMD added inline comments.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
258
RedDocMD updated this revision to Diff 349332.Jun 2 2021, 11:16 AM

Moved visitor entirely to SmartPtrChecker.cpp, other refactors, better naming.

Important question from @vsavchenko:

I have two major questions about this implementation:

  • Why don't we need an actual check for IfStmt? Won't it trigger on bool unused = !pointer;? And if so it won't mean constrained.
  • Why do we only care about implicit pointer-to-bool conversion? What about situations like pointer == nullptr, NULL != pointer, __builtin_expect(pointer, 0), etc?
NoQ added a comment.Jun 2 2021, 10:04 PM

Important question from @vsavchenko:

I have two major questions about this implementation:

  • Why don't we need an actual check for IfStmt? Won't it trigger on bool unused = !pointer;? And if so it won't mean constrained.
  • Why do we only care about implicit pointer-to-bool conversion? What about situations like pointer == nullptr, NULL != pointer, __builtin_expect(pointer, 0), etc?

I think there's no way around re-using/generalizing the logic from ConditionBRVisitor::VisitNode in some form. I guess you could try to separate the part where it looks at the current program point and finds out what's constrained. Then apply it to the moment of time where the interesting constraint appears (whereas ConditionBRVisitor continously scans all program points with the same hopefully-reusable logic).

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
258

So I guess the tests at lines 317 and 378 of smart-ptr-text-output.cpp work because of the CFG messing with the AST?

Yes. The rest of the static analyzer works for the same reason; a lot of code relies on it.

So should I remove the assert?

The assert is correct but the message is wrong / misleading.