This is an archive of the discontinued LLVM Phabricator instance.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
496–497	You should emit a sort of grammatically correct diagnostic message even if the region can not be pretty-printed. @steakhal Does it look better now?

Harbormaster completed remote builds in B90428: Diff 325827.Feb 23 2021, 11:02 AM

steakhal added a reviewer: xazax.hun.Feb 23 2021, 11:16 AM

steakhal added inline comments.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
496–497	Can we cover both branches with tests?

Herald added a subscriber: rnkovacs. · View Herald TranscriptFeb 23 2021, 11:16 AM

RedDocMD marked 2 inline comments as done.Feb 23 2021, 8:01 PM

RedDocMD added inline comments.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
496–497	I am not sure if this can be done. Because, right now the only Smart Pointer that has been handled is `std::unique_ptr`. It turns out that it can be pretty printed already, so I don't know how to test the other branch.

Nice!

I suspect you're adding too many notes. The note needs to not be there if the *raw* pointer is not tracked. Eg., I suspect that your patch would add a note in the following case in which it shouldn't be there because the raw pointer value doesn't participate in the report despite smart pointer region being interesting:

std::unique_ptr<A> P;
A *a = P.get(); // shound't emit a note here
P->foo();

It's important to not emit redundant notes because users typically take these checker-specific notes as an indication that this information is an essential piece of evidence of the bug in their program. In this example they'd believe that the analyzer has figured out that the smart pointer is null by looking at what happens to the raw pointer value. So they may become very confused if this isn't the case.

In D97183#2589445, @NoQ wrote:
Nice!

I suspect you're adding too many notes. The note needs to not be there if the *raw* pointer is not tracked. Eg., I suspect that your patch would add a note in the following case in which it shouldn't be there because the raw pointer value doesn't participate in the report despite smart pointer region being interesting:
std::unique_ptr<A> P;
A *a = P.get(); // shound't emit a note here
P->foo();
It's important to not emit redundant notes because users typically take these checker-specific notes as an indication that this information is an essential piece of evidence of the bug in their program. In this example they'd believe that the analyzer has figured out that the smart pointer is null by looking at what happens to the raw pointer value. So they may become very confused if this isn't the case.

@NoQ, in the example you have given, isn't the smart-pointer P null? So shouldn't a warning be emitted for de-referencing it? Or is it that since a is not being used, a warning shouldn't be emitted? Sorry, I could not quite understand that point ...

The warning should be emitted but it shouldn't have a note at P.get() telling the user that an inner pointer was obtained.

@NoQ, I guess I would need to figure out a way to find out if the raw pointer obtained from get() is being used somewhere or is being constrained. I am trying to first figure out whether the raw pointer is being constrained to null, causing a null-deref to detected.

Should not emit note if raw pointer cannot be tracked

If the inner pointer participates in a branch condition guarding the dereference, that memory region is gotta be important, right? So, we should mark it so.
A BugreportVisitor could easily transfer the information about the fact that the dereference was guarded by that particular branch condition - and marking the InnerPointerVal (MemRegion) important.

This way the NoteTag for the get() could emit the warning.

The other approach, you @RedDocMD proposed about checking the constraint for the inner pointer, seems somewhat odd to me.
It could work, but I think the visitor is cleaner.

Harbormaster completed remote builds in B91533: Diff 327396.Mar 2 2021, 4:25 AM

In D97183#2596865, @steakhal wrote:

If the inner pointer participates in a branch condition guarding the dereference, that memory region is gotta be important, right? So, we should mark it so.
A BugreportVisitor could easily transfer the information about the fact that the dereference was guarded by that particular branch condition - and marking the InnerPointerVal (MemRegion) important.

This way the NoteTag for the get() could emit the warning.

The other approach, you @RedDocMD proposed about checking the constraint for the inner pointer, seems somewhat odd to me.
It could work, but I think the visitor is cleaner.

TBH, I don't like my approach either. I feel that it leaves out some cases.
The InnerPointerVal memory region is not marked as interesting as of now, I have tried that out. The branch condition constraint is set by the ConstraintManager and it is queried via in the State in the method smartptr::isNullSmartPtr at SmartPtrModelling.cpp:104. I have to see if the ConstraintManager can mark the memory region as important. @steakhal, @NoQ what do you think?

In D97183#2597099, @RedDocMD wrote:

The InnerPointerVal memory region is not marked as interesting as of now, I have tried that out. The branch condition constraint is set by the ConstraintManager and it is queried via in the State in the method smartptr::isNullSmartPtr at SmartPtrModelling.cpp:104. I have to see if the ConstraintManager can mark the memory region as important.

Hm, I don't think you can make this work.
The deref bug is reported only if the smartptr::isNullSmartPtr(State, ThisRegion) is true. Which is only true if the InnerPointVal is known to be null. So the information on how we get to know that the smart pointer is null is already lost.
From this perspective, I don't think you have any other choice than to walk back from the bug to the root using a bugreport visitor - and check whether or not the inner pointer is used in a branch condition.
I might be wrong about this, since this was the first time I had a deeper look at the SmartPtrChecker.

The TaintBugVisitor could give you a hint on how to implement this.

In some cases BugReport.isInteresting(InnerPointerVal.getAsSymbol()) would yield us exactly what we want. But if there's no symbol we have no choice but to interact with the trackExpressionValue facility and this may turn out to be pretty challenging.

We could, for instance, teach it to mark exploded nodes as interesting when it changes tracking mode. That'd be a completely new form of interestingness for us to track. Or maybe pairs (exploded node, expression) so that to be more specific. Then we could query it from our note tag.

Sorry, was a bit caught up with assignments. I will try to come up with a better implementation with the advice given by @NoQ and @steakhal.

In D97183#2598806, @NoQ wrote:

We could, for instance, teach it to mark exploded nodes as interesting when it changes tracking mode.

@NoQ, what is tracking mode for an ExplodedNode?

I am trying to use scan-build on a file to see what sort of errors are reported:
./llvm-project/release/bin/scan-build -o . -enable-checker alpha.cplusplus.SmartPtr -analyzer-config alpha.cplusplus.SmartPtrModelling:ModelSmartPtrDereference=true clang++ -c uniq_deref.cpp.
@NoQ, @vsavchenko why does this not work?

why does this not work?

How does this not work? What does it say?

what is tracking mode for an ExplodedNode?

Mmm yeah, in hindsight i should have explained it much more.

First of all, we have "bug visitors". They are the entity that adds notes to bug reports. Note tags are a new way to add notes to the bug report but it's still the same visitors under the hood, i.e. there's visitor that scans note tags and invokes their callback to produce notes. These visitors scan the report from bottom to top. They typically emit note whenever something changes in program state. For instance, if an interesting pointer symbol in the program state changes its status from "allocated" to "release" (i.e., the visitor was so far only seeing nodes in which it was released but now it encounters the first node in which it's not released yet) then it adds a note "pointer was released" which is relevant for use-after-free warnings.

Now, sometimes we have to emit notes with respect to values that aren't symbols. For instance, in null dereference bugs we have to explain the movement of a value that's a plain and simple null pointer. Unlike symbols who each have unique identity that captures their backstory and properties, such "concrete" values are indistinguishable from each other. If we see a null in the older state and a null in a newer state, we can't tell if it's the same null or a different null. This makes it much harder to explain the journey a specific null value has undertaken in order to end up in our pointer that we've ended up dereferencing.

This is where trackExpressionValue comes in. It knows how to track concrete values such as Null or Undefined. The way it works is that it tracks something in the state that corresponds to that null value but does have an identity, typically either memory regions ("the null pointer is currently stored in this variable") or expressions ("the null pointer is currently being returned from this call expression"). Neither memory regions nor (especially!) expressions are guaranteed to stay the same throughout the entire journey so we have to skip from one to the other in order to keep tracking our null pointer (say, "the null pointer was returned from a call expression which acts as an initializer to a variable; now we can stop tracking the variable and start tracking the call expression"). This is what I referred to as changing modes. It's also very clear from the static analyzer code where the mode changes: namely, trackExpressionValue is not a single visitor but a combination of visitors that recursively attach new instances of themselves to the report as they ascend it. For instance, in the above example a visitor that tracks a variable would finish and attach a new visitor that tracks a call expression.

So basically i suspect that the act of reattaching the visitor could be documented through interestingness for your checker to pick up. That would allow you to query whether the call-expression P.get() returns an interesting null value as opposed to a dull, ordinary null value that's unrelated to the report.

In D97183#2615096, @NoQ wrote:

why does this not work?

How does this not work? What does it say?

Sorry, my bad! I had made a typo.

@NoQ, looking through the source code of trackExpressionValue I can see that it adds many visitors to the BugReport passed to it. That I believe is the recursive attachment of visitors you described above.
So, as far as I understood, I have to make changes in this function to mark an ExplodedNode as interesting when it changes tracking mode. This change is marked by when recursively a new visitor is attached (or at least in some of those places, the exact places will have to be figured out).
Then this can be queried from the checker to obtain the information that is needed.
Am I thinking on the right track?

I can see that it adds many visitors to the BugReport passed to it.

Yes and some of these visitors will call trackExpressionValue() again in their Visit...() functions which corresponds to adding visitors in the middle of visitation which is arguably the most interesting part.

Calling trackExpressionValue to mark InnerPointerVal as interesting

Harbormaster completed remote builds in B93326: Diff 329989.Mar 11 2021, 9:34 AM

@NoQ, I am using trackExpressionValue to add intersetingness to the InnerPointerVal. That solves the original problem. However, it is causing the MoveChecker to add extra warnings to use-after-move cases. Essentially, when a unique_ptr is moved and subsequently used, it triggers two warnings - one from SmartPointerModelling and another from MoveChecker. It seems to me that two separate checkers are tracking the same bug - use after move.
So should I make another patch to modify SmartPointerModelling to not emit warnings on use after move (instead just leaving the GDM updating code)? Or is there a better solution to this?

Removed an embarassingly dumb mistake

Harbormaster completed remote builds in B93640: Diff 330422.Mar 12 2021, 9:54 PM

@NoQ, sorry for the absurdly dumb mistake. Not entirely sure what I was thinking.
Can you please have a look at it now?

@NoQ, could you please have a look at this?

By tracking the call-expression you're basically tracking the raw pointer value because that's what operators * and -> return. Of course operator * returns an lvalue reference rather than a pointer but we don't make a difference when it comes to SVal representation.

So you're saying that simply by always tracking the (final) raw pointer value and checking whether the raw value is interesting upon .get() you dodge the communication problem entirely. I think this is quite a statement! I'd like a stronger evidence for that than passing a couple of tests. Does the following test work?:

void test(std::unique_ptr<A> P) {
  A *a = P.get(); // unlike your positive test this doesn't deserve a note
                  // because we weren't looking at 'a' when we concluded
                  // that the pointer is null
  if (!P) {
    P->foo();
  }
}

Essentially, when a unique_ptr is moved and subsequently used, it triggers two warnings - one from SmartPointerModelling and another from MoveChecker

Do i understand correctly that this doesn't happen anymore when you stopped creating a new node?

Added a negative test

Harbormaster completed remote builds in B94390: Diff 331461.Mar 17 2021, 10:10 PM

Does the following test work?

@NoQ, it seems to be working.

So you're saying that simply by always tracking the (final) raw pointer value and checking whether the raw value is interesting upon .get() you dodge the communication problem entirely

I would not say it has been dodged, but rather that problem had already been solved by trackExpressionValue. At line 1949 of BugReporterVisitors.cpp (inside the trackExpressionValue function) is:

if (LVState->getAnalysisManager().getAnalyzerOptions().ShouldTrackConditions)
    report.addVisitor(std::make_unique<TrackControlDependencyCondBRVisitor>(
          InputNode));

Approximately, TrackControlDependencyCondBRVisitor is a visitor that looks into condition statements and via mutual recursion with trackExpressionValue marks SVal's as interesting if they are used in a condition and that condition constrains the Expr on which the visitor was originally called on. This gave me the idea that calling trackExpressionValue is all that we really need to do, since it already contains a visitor to discover the interestingness we need. Looking into this function made me feel that trackExpressionValue is actually a very powerful function which solves a lot of these communication problems.

Do i understand correctly that this doesn't happen anymore when you stopped creating a new node?

Yes, and I found out my blunder after staring at the exploded graph dump. Creating a new node was un-necessary since trackExpressionValue needs a node corresponding to the expression where we find the bug, and that was already being created above.

Added some more positive tests

Harbormaster completed remote builds in B94398: Diff 331469.Mar 17 2021, 10:52 PM

I did not follow the discussion closely but we (CodeChecker team) might have a similar problem.
Consider this: https://godbolt.org/z/835P38

int do_bifurcation(int p) { return p < 0; }

int b(int x, int y) {
  int tmp = 13 / y;  // y can't be 0.
  (void)tmp;

  int p0 = do_bifurcation(x);  // There is a path where p0 is 0.

  int div = p0 * y; // So, div also becomes 0 on that path.
  return 1 / div;
}

However, the bugreport tells us that you do a division by zero, which was initialized a line above.

Do you think it is a related issue @NoQ?

No-no, TrackControlDependencyCondBRVisitor's purpose is completely different. It tracks symbols that didn't necessarily participate in the report but participated in conditional statements that lexically surround the report. It's used for explaining how did we got ourselves into a given lexical spot but it doesn't explain why is this a problem.

We can get it out of the way:

A *a = P.get(); // no note expected
if (!P) {}
P->foo();

vs.

A *a = P.get(); // expected note
if (!a) {}
P->foo();

I suspect that it may still work and the reason this works is because we're not collapsing .get()'s return value to null when it's constrained to null. Given that the only interesting thing that could happen to the return value of .get() is getting constrained to null (because it's an rvalue, the programmer can't use it to overwrite the raw pointer value inside the pointer), it's either already null or you'd see the constraint tracked once you mark the symbol as interesting.

The reason i'd still not like this solution is because collapsing the symbol to null on .get() (if it's already constrained to null) is arguably the preferred behavior as it makes constraint solver's life easier. But that'd most likely break your tracking solution.

In D97183#2637341, @steakhal wrote:

Do you think it is a related issue @NoQ?

I don't see any smart pointers or null dereferences so... no? But there's definitely an issue with tracking in your example. It's definitely correct that div is initialized with zero on the path on which p0 is zero but the events that led to p0 being zero are not explained which is a bug that needs to be fixed.

clang/test/Analysis/smart-ptr-text-output.cpp
315–386	Looks like your git history is acting up. Your patch adds this test right? Are there more proposed changes in the cpp files that aren't currently highlighted for a similar reason? I'll try to play with your patch locally once this is fixed ^.^

Fixed up the git history

RedDocMD marked an inline comment as done.Mar 20 2021, 11:35 PM

RedDocMD added inline comments.

clang/test/Analysis/smart-ptr-text-output.cpp
315–386	Yeah I seem to have tripped over the single commit rule. It should be fixed now.

Harbormaster completed remote builds in B94890: Diff 332147.Mar 21 2021, 12:24 AM

Re-formatted file

Harbormaster completed remote builds in B94891: Diff 332148.Mar 21 2021, 3:03 AM

Added some more tests

Harbormaster completed remote builds in B94929: Diff 332193.Mar 21 2021, 10:46 PM

@NoQ, why does the following trigger a null-dereference warning? (https://godbolt.org/z/Kxox8qd16)

void g(std::unique_ptr<A> a) {
  A *aptr = a.get();
  if (!aptr) {}
  a->foo();
}

When a->foo() is called, the constraint !aptr is no longer valid and so InnerPointerVal corresponding to a is no longer constrained to be null.
Am I missing something?

Changed approach to use visitor

Harbormaster completed remote builds in B95171: Diff 332538.Mar 22 2021, 11:26 PM

Repaired git history

Removed extra includes

@NoQ, I have taken a different approach this time. I have used a visitor and am storing some more data in the GDM. Together, they distinguish between the following three cases:

If the raw pointer obtained from get() is constrained to null in a path which leads to a Node (and thus State) where a smart-pointer-null-deref bug occurs.
If the raw pointer was null to begin with (because the smart-pointer was null)
If the raw pointer was not null to begin with but the smart-ptr became null after that.

Only in the first case should the note be emitted. I have added some more tests to that effect.
Can you please have a look at this?

Harbormaster completed remote builds in B95173: Diff 332540.Mar 23 2021, 12:18 AM

Harbormaster completed remote builds in B95175: Diff 332542.Mar 23 2021, 12:32 AM

In D97183#2640559, @RedDocMD wrote:
@NoQ, why does the following trigger a null-dereference warning? (https://godbolt.org/z/Kxox8qd16)
void g(std::unique_ptr<A> a) {
  A *aptr = a.get();
  if (!aptr) {}
  a->foo();
}
When a->foo() is called, the constraint !aptr is no longer valid and so InnerPointerVal corresponding to a is no longer constrained to be null.
Am I missing something?

When the if's condition is evaluated, it probably triggered a state split. On one path the aptr (aka. the inner pointer) will be constrained to null.
The only way to be sure is by checking the exploded graph and see where it goes.

In D97183#2643671, @RedDocMD wrote:

@NoQ, I have taken a different approach this time. I have used a visitor and am storing some more data in the GDM. Together, they distinguish between the following three cases:

If the raw pointer obtained from get() is constrained to null in a path which leads to a Node (and thus State) where a smart-pointer-null-deref bug occurs.

If the raw pointer was null to begin with (because the smart-pointer was null)

If the raw pointer was not null to begin with but the smart-ptr became null after that.

Only in the first case should the note be emitted. I have added some more tests to that effect.

All in all, I see where it's going. I don't know, it might be the right thing to do. I haven't spent much time on this topic though.
See my inline comments.

clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
267	I'm not sure if we should expect 16 unique places where `uptr::get()` called on a path. I would guess 4 or 2 is more than enough.
286–289	So you are trying to find the assignment, where the inner pointer is assigned to a variable. This visitor logic seems to be somewhat convoluted. What you want to achieve is slightly similar to `FindLastStoreBRVisitor`. You should have a look at that.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
479	Nit: Declare the variable as close to the usage as you can. In the narrowest scope as well.
494–495	Why don't you 'save' the MemRegion only if the inner pointer is not proven to be null. This would relieve you from checking it later. Nit: I don't like such if branches. The last statement is identical, which is a code smell. It's better to think of this as a function taking stuff and producing a State. An immediately called lambda expression would enclose any local variables and it would suggest that the algorithm it implements is self-contained. I know that I'm the only immediately called lambda expression fan though.

RedDocMD marked 2 inline comments as done.Mar 25 2021, 5:16 AM

RedDocMD added inline comments.

clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
267	Ok
286–289	That is what I had done before. @NoQ pointed out why this wouldn't work in a previous comment.

steakhal added inline comments.Mar 25 2021, 5:24 AM

clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
286–289	Please elaborate on that. I'm not saying that an already existing visitor would perfectly fit your needs. I'm just curious why a similar logic would not work for you. You are trying to iterate over a bunch of decls and init exprs etc. And there is nothing similar to the visitor I mentioned.

RedDocMD marked 3 inline comments as done.Mar 25 2021, 6:09 AM

RedDocMD added inline comments.

clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
286–289	Sorry. I should have written it out better. So you are trying to find the assignment, where the inner pointer is assigned to a variable. Yes and no. I am indeed trying to find where the first assignment occurred, since re-assigning to the pointer obtained from `get()` doesn't give any information regarding regarding the smart pointer being null or not. So what this visitor does that it goes back to the original assignment to find out what SVal was bound to `a.get()`. The Environment doesn't have this info since it is garbage collected somewhere on the way. Also accessing this State allows me to check whether the SVal was null to begin with. I don't think `FindLastStoreBRVisitor` does this.

@NoQ, what do you think?

In D97183#2650172, @steakhal wrote:
In D97183#2640559, @RedDocMD wrote:
@NoQ, why does the following trigger a null-dereference warning? (https://godbolt.org/z/Kxox8qd16)
void g(std::unique_ptr<A> a) {
  A *aptr = a.get();
  if (!aptr) {}
  a->foo();
}
When a->foo() is called, the constraint !aptr is no longer valid and so InnerPointerVal corresponding to a is no longer constrained to be null.
Am I missing something?
When the if's condition is evaluated, it probably triggered a state split. On one path the aptr (aka. the inner pointer) will be constrained to null.
The only way to be sure is by checking the exploded graph and see where it goes.

Yes, that's, like, the whole point. We report unchecked use after a check. If the pointer is never null, why check? If the pointer is sometimes null, why use without a check? The code clearly doesn't make sense. That's what the report says: "assuming 'aptr' is null, there's null dereference on the next line". Once our simulation leaves the lexical scope of the if-condition 'aptr' doesn't suddenly become non-null. Technically speaking, we have no notion of a merge at all. There is literally no merge operation defined on our ProgramState, we do not perform fixpoint iteration, we do not try to combat path explosion, we simply simulate the behavior of the program on a few possible execution paths (defined by branching in the program) and report potential issues.

clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp
286–289	I think it's too late to act when `.get()` is already happening. Like, we're visiting from bottom to top, in reverse to how our normal abstract interpretation goes. So if we only start tracking when we reach `.get()` we won't be able to explain what happens to the pointer between obtaining it from `.get()` and constraining it to null. For instance, if it was moved from one raw pointer variable to another, we won't put a note there, but we should. Let's try the following. Write a visitor to detect the moment of time when the raw pointer value is getting constrained to null. I.e., find when `State->assume(State->get<TrackedRegionMap>(ThisRegion))` stops working. It can be for two reasons: either the inner value is overwritten or it's constrained to null. If it's overwritten, track the newly set value and our job is done. If it's constrained to null, try to find out what's happening (is it an if-statement? is it an eagerly-assume action over a comparison operator?). We already have a common visitor that's good at figuring this out, maybe it'd be possible to reuse the code. In any case, start tracking the symbol and possibly emit a checker-specific note immediately ("raw pointer value constrained to null" or something like that).

Right, sorry for the late reply, @NoQ.
I will get to it once I get these assignments off my head.

For the following function:

void foo(std::unique_ptr<A> P) {
  A* praw = P.get();
  A* other = praw;
  if (other) {}
  P->foo();
}

Where do we expect a note? Where praw is initialized, where other is initialized or both?

In D97183#2699080, @RedDocMD wrote:
For the following function:
void foo(std::unique_ptr<A> P) {
  A* praw = P.get();
  A* other = praw;
  if (other) {}
  P->foo();
}
Where do we expect a note? Where praw is initialized, where other is initialized or both?

I would expect no notes at all, since there is no bug.

In D97183#2699336, @steakhal wrote:
In D97183#2699080, @RedDocMD wrote:
For the following function:
void foo(std::unique_ptr<A> P) {
  A* praw = P.get();
  A* other = praw;
  if (other) {}
  P->foo();
}
Where do we expect a note? Where praw is initialized, where other is initialized or both?
I would expect no notes at all, since there is no bug.

According to the existing analyzer logic, there is a bug. If you check other for null, we can conclude that there are circumstances when it is null indeed.

In D97183#2700810, @vsavchenko wrote:
In D97183#2699336, @steakhal wrote:
In D97183#2699080, @RedDocMD wrote:
For the following function:
void foo(std::unique_ptr<A> P) {
  A* praw = P.get();
  A* other = praw;
  if (other) {}
  P->foo();
}
Where do we expect a note? Where praw is initialized, where other is initialized or both?
I would expect no notes at all, since there is no bug.
According to the existing analyzer logic, there is a bug. If you check other for null, we can conclude that there are circumstances when it is null indeed.

I think we can conclude that P must be non-null (since it was unconditionally dereferenced), thus the previous check on the inner pointer and the branch it guards must be dead! This fact deserves a report, you are right. My bad.

In this case, the report should show how the inner pointer got bound to the other. Thus, we should highlight both assignments.

In D97183#2701441, @steakhal wrote:

I think we can conclude that P must be non-null (since it was unconditionally dereferenced), thus the previous check on the inner pointer and the branch it guards must be dead!

Under the same logic we also can't report null dereference in the following code:

void bar() {
  A *p = nullptr;
  p->foo();
}

Indeed, the null pointer p is unconditionally dereferenced, therefore the entire function bar() must be dead!

Or maybe the entire executable binary into which this code is linked is never run. Some users definitely complain about static analyzer analyzing code that was entirely dead from the start, suggested integrating with the dynamic PGO facilities to analyze hot code first.

It's important to realize that with pure static analysis it is absolutely impossible to reliably report a bug more severe than dead code. Any form of static analysis only ever finds code that doesn't make sense. It cannot make assumptions about how often the code is executed in practice or how severe and impactful the bug is to the users of the program under analysis. When we report anything that doesn't directly scream "dead code", like null dereference, we're still always implicitly saying "This code doesn't make sense because it either has dead parts or _____". In fact we should probably do a better job at managing expectations because users do become upset when we promise them use-after-frees but in reality only find dead code that "would have caused use-after-frees if it was ever run".

It's important to realize that with pure static analysis it is absolutely impossible to reliably report a bug more severe than dead code. Any form of static analysis only ever finds code that doesn't make sense. It cannot make assumptions about how often the code is executed in practice or how severe and impactful the bug is to the users of the program under analysis. When we report anything that doesn't directly scream "dead code", like null dereference, we're still always implicitly saying "This code doesn't make sense because it either has dead parts or _____". In fact we should probably do a better job at managing expectations because users do become upset when we promise them use-after-frees but in reality only find dead code that "would have caused use-after-frees if it was ever run".

Tbh, given how loose of a memory model we are dealing with (at its worst, it is the C memory model), I think the static-analyzer does a great job at detecting what it possibly can. As for the user's expectation, I think we just need to wait for more adoption of the static analyzer. Then users will know exactly what to expect (we do not yell at the C++ compiler for not preventing use after delete, we do not yell at the Rust compiler for not allowing mismatching lifetimes - we know what to expect, and just work with that).

Changed to a different visitor

@NoQ, I have changed to the visitor that you suggested.

Harbormaster completed remote builds in B99760: Diff 338933.Apr 20 2021, 12:31 PM

NoQ mentioned this in D100852: [analyzer] Track leaking object through stores.Apr 21 2021, 1:08 AM

@NoQ?
(I actually should remove some extra includes and extra member fields)

@NoQ ?

NoQ added inline comments.May 3 2021, 1:55 PM

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
183–185	`SVal ExprVal = State->getSVal(Sub, Node->getLocationContext());`.
203–207	Your visitor doesn't need to track raw pointers through raw variables. `trackExpressionValue()` is fully capable of doing this. The problem only becomes checker-specific when the connection between smart pointers and raw pointers is involved. I meant assignments between two smart pointers. Maybe even reduce the visitor to only the if-statement case and have it check all interesting smart pointers instead of a specific smart pointer. Propagation of interestingness across smart pointers can be handled by note tags for move/copy assignmnets and constructors.

Added a test for multiple get's

Herald added a reviewer: teemperor. · View Herald TranscriptMay 19 2021, 10:28 PM

Herald added a subscriber: manas. · View Herald Transcript

Removed unnecessary includes

Harbormaster completed remote builds in B105357: Diff 346633.May 20 2021, 12:35 AM

vsavchenko added inline comments.May 20 2021, 1:52 AM

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
37	nit: class name should be a noun (functions and methods are verbs)
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
86	Probably forgotten
149–152	We probably should have a checker in clang-tidy (maybe we already do), for situations like this. C++ has a long-lasting tradition that set's/map's `insert` return a pair: iterator + bool. The second part tells the user if the `insert` operation actually added the new element or it was already there. This allows us to do 1 search instead of two.
154–176	Is there any reason it's not a method of `FindWhereConstrained`?
178	After that you have 3 distinct cases to handle. It's a good opportunity for extracting them into separate functions.
192–193	I think it's better to `IgnoreParensAndCasts` instead of manual traversal.
229	Variables are capitalized.
230	It is a widespread pattern in LLVM to declare such variables directly in `if` statements: if (auto Report = bugReportOnGet(RHS)) return Report;
240	So, situations like `int a = nullptr, b = smart.get();` are not supported?
242	`llvm::find_if`
251–258	This level of nestedness is frowned upon. It is a good tell that the function should be refactored. The following code: if (cond1) { . . . if (cond2) { . . . if (cond3) { . . . } } } return nullptr; can be refactored into: if (!cond1) return nullptr; . . . if (!cond2) return nullptr; . . . if (!cond3) return nullptr; . . . It is easier to follow the logic if the function is composed in this manner because from the very beginning you know that `else` with more stuff is not going to follow.

Code clean up

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
154–176	Not really.
192–193	What is IgnoreParensAndCasts`? I didn't find it in the source code anywhere (ripgrep, that is).
240	No it works even in that case (I have added a test for that). It's got to do with how the AST data structures are (`int a = nullptr, b = smart.get();` is considered a single decl).
242	Not sure if that'll work neatly since I actually need the return value of the predicate function (the report).
251–258	Do you still think that's the case now? (After breaking it into functions). I also think that for the sort of pattern matching we are doing in this patch, the nested if's make more sense.

Removed un-necessary includes

Harbormaster completed remote builds in B105481: Diff 346817.May 20 2021, 2:05 PM

teemperor added inline comments.May 21 2021, 1:04 AM

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
192–193	Just a typo, the actual name is `IgnoreParenCasts` (Expr::IgnoreParenCasts)
255	LLVM-code style mandates no curly braces around single-line ifs.
268	(I think this was already pointed out, but early-exits are the way to go in LLVM. const auto DS = llvm::dyn_cast<DeclStmt>(S)); if (!DS) return nullptr; const Decl D = DS->getSingleDecl(); const auto *VD = llvm::dyn_cast<VarDecl>(D); if (!VD) return nullptr; .... ``

Right, @teemperor, I will do the refactors once we figure out how to utilize trackExpressionValue() here (because that will eliminate quite a bit of code from GetNoteVisitor, so no point in refactoring those bits).

Refactors to make code more stylistically accurate

Removed unnecessary include

I have put in the refactors, @teemperor.

Harbormaster completed remote builds in B105813: Diff 347260.May 23 2021, 1:55 PM

vsavchenko added inline comments.May 26 2021, 1:54 AM

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
38	Sorry for picking on it again, but is it visiting "get notes"? Otherwise, it is just a placeholder name that doesn't tell the reader what this class is actually for. Also, is there a reason why it is declared in the header file and couldn't be a part of `SmartPtrChecker.cpp` instead?
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
79	I think it's better to use `REGISTER_SET_FACTORY_WITH_PROGRAMSTATE` instead and keep factory as part of the state as well.
169	Functions are actions, it is better to express actions with verbs. Additionally, I don't think that it really reflects what it does without the verb either.
175–177	This type of predicates shouldn't be scattered throughout the code here and there. It should be definitely unified and put into a function that is shared with the checker and other parts of the model. One simple example here - what if we need to support other types of smart pointers? Should we go around all over the code looking for the places like this or fix it in one place? It also naturally creates a question "What about `shared_ptr`"?
192	Just wondering if `return {};` will be sufficient here.
200–203	I think it's better to unite these two into `if (!E \|\| E->getCastKind()...)`
206–208	I guess it escaped during code refactoring: `SVal ExprVal = State->getSVal(Sub, Node->getLocationContext());`
224–227	Similar note here: `if (!BO \|\| BO->getOpcode()...)`
230–233	And here
499–500	I generally don't like repeating code in both branches of the `if` statement. Here they share the following logic: add `CallExpr` and update the state. We can easily update the code to share the code: const auto ExistingSet = State->get<ExprsFromGet>(ThisRegion); auto BaseSet = ExistingSet ? ExistingSet : StmtSetFactory.getEmptySet(); auto NewStmtSet = StmtSetFactory.add(BaseSet, CallExpr); State = State->set<ExprsFromGet>(ThisRegion, NewStmtSet);

More refactoring

Removed extra include

Harbormaster completed remote builds in B106438: Diff 348166.May 26 2021, 11:05 PM

vsavchenko added inline comments.May 27 2021, 1:32 AM

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
37	It is again some sort of verb. I don't know how to read it except for "emit note on get"-visitor. What about something like `GetNoteEmitter`? I mean do we really need to put `Visitor` in the name?
38	This comment is marked as "Done", but there is no code change nor justification.
57	IMO, these three `visit` functions are a bit confusing in their names. I guess, my expectation for a group of methods that have very similar names is that they have similar signatures. And here `visitIfStmt` doesn't return diagnostic piece as opposed to two other methods. We are not bound here to name everything `visitABC` since these methods are not part of the visitor interface. My suggestion here is to actually keep `visitIfStmt` because that's what you actually do here, you simply visit it. Maybe change the return type to bool (signifying that it was indeed an if statement), so we don't check for other possibilities. As for other two functions, I'd prefer something like `emitNoteForAssignment` and `emitNoteForInitialization` to keep the naming pattern going.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
79	It is marked as "Done", but the code is same.
111	My guess is that it should be just `isStdSmartPtr`
171	I'm still here picking on names 😅 It doesn't emit "bug report" it emits "note". And actually if you name your class as `GetNoteEmitter` or something similar you don't need to mention `get` here again.
198–202	I have two major questions about this implementation: Why don't we need an actual check for `IfStmt`? Won't it trigger on `bool unused = !pointer;`? And if so it won't mean constrained. Why do we only care about implicit pointer-to-bool conversion? What about situations like `pointer == nullptr`, `NULL != pointer`, `__builtin_expect(pointer, 0)`, etc?
230	It's better to use `IgnoreParenCasts` here as well.

RedDocMD added inline comments.May 27 2021, 1:34 AM

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
38	Yes it is the line `const SVal *InnerPtr = State->get<TrackedRegionMap>(ThisRegion);` in the `VisitNode` function. This uses the `TrackedRegionMap` which is defined in `SmartPtrModelling.cpp` but the visitor must be added to the bug-report in `SmartPtrChecker.cpp`.
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
79	My bad, I should have put in the TODO.
192	Yup
499–500	Right, thanks :)

My bad! I forgot to submit the replies.

RedDocMD marked 9 inline comments as done.May 27 2021, 1:42 AM

RedDocMD added inline comments.

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
37	I put in `Visitor` because all the visitors that I have come across end with Visitor. But then again, I am sure that if we look, we'll find counter-examples. So your call here ...

RedDocMD marked an inline comment as done.May 27 2021, 1:43 AM

vsavchenko added inline comments.May 27 2021, 1:49 AM

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
38	I see. IMO it still means that the visitor belongs with the checker and was put into this header as a workaround. So maybe instead we can add `getInnerPointer(const ProgramStateRef State, const MemRegion *ThisRegion)` since we already have a very similar `isNullSmartPtr` in this header.

vsavchenko added inline comments.May 27 2021, 1:57 AM

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
37	That's true, but it doesn't mean that we shouldn't care about the names and how they read. You cannot say that you visit "emit note on get", so the name doesn't help to understand what this class does. If you want to name it `GetVisitor` - no problem because it does visit `get`s. But if you want to state in the name that it emits notes on gets, the name should say `Emitter`, and that second name is more specific.

A brief summary of an offline discussion we recently had.

(1) Basically we figured out that it's still necessary to do something like I originally suggested:

In D97183#2598806, @NoQ wrote:

We could, for instance, teach it to mark exploded nodes as interesting when it changes tracking mode. That'd be a completely new form of interestingness for us to track. Or maybe pairs (exploded node, expression) so that to be more specific. Then we could query it from our note tag.

This is necessary because the symbol produced by .get() is not immediately collapsed to a constant and it remains interesting as a symbol for the entire duration of the new visitor's lifetime, but there may be unrelated .get()s on the same smart pointer during said lifetime that don't deserve a note despite producing the same symbol.

(2) We also came up with a different approach to communicating with trackExpressionValue(). First of all, we probably don't need to mark all nodes/expressions on which trackExpressionValue() switches modes as interesting; we're only interested in the spot where tracking ends. This happens because the checker fully models .get() and therefore it's impossible for a generic solution like trackExpressionValue() to proceed with tracking as that would have required checker-specific machinery. We could reduce the scope of proposal (1) by only marking the last node as interesting but I have a better idea: let's add a callback to trackExpressionValue() that's invoked once tracking ends. In our case such callback would attach a checker-specific visitor to the smart pointer which solves our problem perfectly.

Such callback could be useful in a lot more cases though, because it provides us with an extremely generic benefit of knowing the origin of the value. We already demand such knowledge in a number of other machines that are currently hard-coupled to trackExpressionValue(): namely, i'm talking about inlined defensive check suppressions. Both of these suppressions basically say "if a null/zero value originates from a nested function call that was exited before the bug node, suppress the warning". These suppressions don't care where the value was passing through, they only care where it originated from. As such, by providing a callback for the origin of the value, we could decouple these suppressions and possibly even move them into the respective checkers (eg., the null dereference checker). I think this could be an excellent refactoring pass.

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h
61–64	Typo!
clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
258	That's not what the assert is saying; the assert is saying that the `DeclStmt` has exactly one `Decl`. It basically forbids code like int x = 1, y = 2; . You may wonder why don't you crash all over the place. That's because Clang CFG creates its own `DeclStmt`s that aren't normally present in the AST, that always have exactly one declaration. This is necessary because there may be non-trivial control flow between these declarations (due to, say, presence of operator `?:` in the initializer) so they have to be represented as different elements (possibly even in different blocks) in the CFG.

RedDocMD added inline comments.May 27 2021, 11:22 AM

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
258	So I guess the tests at lines `317` and `378` of `smart-ptr-text-output.cpp` work because of the CFG messing with the AST? So should I remove the assert?

RedDocMD marked 7 inline comments as done.Jun 2 2021, 11:15 AM

RedDocMD added inline comments.

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
258	@NoQ?

Moved visitor entirely to SmartPtrChecker.cpp, other refactors, better naming.

Important question from @vsavchenko:

I have two major questions about this implementation:

Why don't we need an actual check for IfStmt? Won't it trigger on bool unused = !pointer;? And if so it won't mean constrained.

Why do we only care about implicit pointer-to-bool conversion? What about situations like pointer == nullptr, NULL != pointer, __builtin_expect(pointer, 0), etc?

Harbormaster completed remote builds in B107290: Diff 349332.Jun 2 2021, 12:32 PM

In D97183#2794256, @RedDocMD wrote:

Important question from @vsavchenko:

I have two major questions about this implementation:

Why don't we need an actual check for IfStmt? Won't it trigger on bool unused = !pointer;? And if so it won't mean constrained.

Why do we only care about implicit pointer-to-bool conversion? What about situations like pointer == nullptr, NULL != pointer, __builtin_expect(pointer, 0), etc?

I think there's no way around re-using/generalizing the logic from ConditionBRVisitor::VisitNode in some form. I guess you could try to separate the part where it looks at the current program point and finds out what's constrained. Then apply it to the moment of time where the interesting constraint appears (whereas ConditionBRVisitor continously scans all program points with the same hopefully-reusable logic).

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp
258	So I guess the tests at lines 317 and 378 of smart-ptr-text-output.cpp work because of the CFG messing with the AST? Yes. The rest of the static analyzer works for the same reason; a lot of code relies on it. So should I remove the assert? The assert is correct but the message is wrong / misleading.

Revision Contents

Path

Size

clang/

lib/

StaticAnalyzer/

Checkers/

SmartPtr.h

11 lines

SmartPtrChecker.cpp

181 lines

SmartPtrModeling.cpp

44 lines

test/

Analysis/

smart-ptr-text-output.cpp

73 lines

Diff 349332

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h

Show All 16 Lines

#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h" #include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"

namespace clang { namespace clang {

namespace ento { namespace ento {

namespace smartptr { namespace smartptr {

/// Returns true if the event call is on smart pointer. /// Returns true if the event call is on smart pointer.

bool isStdSmartPtrCall(const CallEvent &Call); bool isStdSmartPtrCall(const CallEvent &Call);

bool isStdSmartPtr(const CXXRecordDecl *Rec);

/// Returns whether the smart pointer is null or not. /// Returns whether the smart pointer is null or not.

bool isNullSmartPtr(const ProgramStateRef State, const MemRegion *ThisRegion); bool isNullSmartPtr(const ProgramStateRef State, const MemRegion *ThisRegion);

const BugType *getNullDereferenceBugType(); const BugType *getNullDereferenceBugType();

// TODO: Get rid after D97183

// Returns whether E is of the form a.get(), where the MemRegion corresponding

// to the smart-ptr a is ThisRegion.

bool isExprObtainedFromGet(const ProgramStateRef State,

const MemRegion *ThisRegion, const Expr *E);

vsavchenkoUnsubmitted

Done

nit: class name should be a noun (functions and methods are verbs)

vsavchenko: nit: class name should be a noun (functions and methods are verbs)

vsavchenkoUnsubmitted

Done

It is again some sort of verb. I don't know how to read it except for "emit note on get"-visitor. What about something like GetNoteEmitter? I mean do we really need to put Visitor in the name?

vsavchenko: It is again some sort of verb. I don't know how to read it except for "emit note on get"…

RedDocMDAuthorUnsubmitted

Done

I put in Visitor because all the visitors that I have come across end with *Visitor*. But then again, I am sure that if we look, we'll find counter-examples. So your call here ...

RedDocMD: I put in `Visitor` because all the visitors that I have come across end with *Visitor*. But…

vsavchenkoUnsubmitted

Done

That's true, but it doesn't mean that we shouldn't care about the names and how they read. You cannot say that you visit "emit note on get", so the name doesn't help to understand what this class does. If you want to name it GetVisitor - no problem because it does visit gets. But if you want to state in the name that it emits notes on gets, the name should say Emitter, and that second name is more specific.

vsavchenko: That's true, but it doesn't mean that we shouldn't care about the names and how they read. You…

/// Returns the SVal of a MemRegion from the TrackedRegionMap

vsavchenkoUnsubmitted

Done

Sorry for picking on it again, but is it visiting "get notes"? Otherwise, it is just a placeholder name that doesn't tell the reader what this class is actually for.
Also, is there a reason why it is declared in the header file and couldn't be a part of SmartPtrChecker.cpp instead?

vsavchenko: Sorry for picking on it again, but is it visiting "get notes"? Otherwise, it is just a…

RedDocMDAuthorUnsubmitted

Done

Yes it is the line const SVal *InnerPtr = State->get<TrackedRegionMap>(ThisRegion); in the VisitNode function. This uses the TrackedRegionMap which is defined in SmartPtrModelling.cpp but the visitor must be added to the bug-report in SmartPtrChecker.cpp.

RedDocMD: Yes it is the line `const SVal *InnerPtr = State->get<TrackedRegionMap>(ThisRegion);` in the…

vsavchenkoUnsubmitted

Done

I see. IMO it still means that the visitor belongs with the checker and was put into this header as a workaround.
So maybe instead we can add getInnerPointer(const ProgramStateRef State, const MemRegion *ThisRegion) since we already have a very similar isNullSmartPtr in this header.

vsavchenko: I see. IMO it still means that the visitor belongs with the checker and was put into this…

vsavchenkoUnsubmitted

Done

This comment is marked as "Done", but there is no code change nor justification.

vsavchenko: This comment is marked as "Done", but there is no code change nor justification.

const SVal *getInnerPointer(const ProgramStateRef State,

const MemRegion *ThisRegion);

} // namespace smartptr } // namespace smartptr

} // namespace ento } // namespace ento

} // namespace clang } // namespace clang

#endif // LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_SMARTPTR_H #endif // LLVM_CLANG_LIB_STATICANALYZER_CHECKERS_SMARTPTR_H

vsavchenkoUnsubmitted

Done

IMO, these three visit functions are a bit confusing in their names. I guess, my expectation for a group of methods that have very similar names is that they have similar signatures. And here visitIfStmt doesn't return diagnostic piece as opposed to two other methods.
We are not bound here to name everything visitABC since these methods are not part of the visitor interface.
My suggestion here is to actually keep visitIfStmt because that's what you actually do here, you simply visit it. Maybe change the return type to bool (signifying that it was indeed an if statement), so we don't check for other possibilities.
As for other two functions, I'd prefer something like emitNoteForAssignment and emitNoteForInitialization to keep the naming pattern going.

vsavchenko: IMO, these three `visit` functions are a bit confusing in their names. I guess, my expectation…

NoQUnsubmitted

Done

const SVal InnerPtrVal);

- PathDiagnosticPieceRef visitAssgnStmt(const ExplodedNode *Node,

- const ProgramStateRef State,

- BugReporterContext &BRC, const Stmt *S,

- const SVal InnerPtrVal);

+ PathDiagnosticPieceRef visitAssignStmt(const ExplodedNode *Node,

+ const ProgramStateRef State,

+ BugReporterContext &BRC, const Stmt *S,

+ const SVal InnerPtrVal);

PathDiagnosticPieceRef visitDeclStmt(const ExplodedNode *Node,

Typo!

NoQ: Typo!

clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp

	Show First 20 Lines • Show All 47 Lines • ▼ Show 20 Lines

	// Define the inter-checker API.			// Define the inter-checker API.
	namespace clang {			namespace clang {
	namespace ento {			namespace ento {
	namespace smartptr {			namespace smartptr {

	const BugType *getNullDereferenceBugType() { return NullDereferenceBugTypePtr; }			const BugType *getNullDereferenceBugType() { return NullDereferenceBugTypePtr; }

				class GetNoteEmitter : public BugReporterVisitor {
				private:
				const MemRegion *ThisRegion;
				llvm::SmallPtrSet<const ValueDecl *, 4> PtrsTrackedAndConstrained;
				llvm::SmallPtrSet<const Stmt *, 16> StmtsCovered;

				public:
				GetNoteEmitter(const MemRegion *ThisRegion)
				: ThisRegion{ThisRegion}, PtrsTrackedAndConstrained{}, StmtsCovered{} {}
				PathDiagnosticPieceRef VisitNode(const ExplodedNode *Node,
				BugReporterContext &BRC,
				PathSensitiveBugReport &BR) override;

				PathDiagnosticPieceRef emitNote(const ExplodedNode *Node,
				BugReporterContext &BRC, const Expr *E);

				bool visitIfStmt(const ExplodedNode *Node, const ProgramStateRef State,
				BugReporterContext &BRC, const Stmt *S,
				const SVal InnerPtrVal);

				PathDiagnosticPieceRef emitNoteForAssignment(const ExplodedNode *Node,
				const ProgramStateRef State,
				BugReporterContext &BRC,
				const Stmt *S,
				const SVal InnerPtrVal);

				PathDiagnosticPieceRef emitNoteForInitialization(const ExplodedNode *Node,
				const ProgramStateRef State,
				BugReporterContext &BRC,
				const Stmt *S,
				const SVal InnerPtrVal);

				void Profile(llvm::FoldingSetNodeID &ID) const override {
				ID.Add(ThisRegion);
				}
				};

				PathDiagnosticPieceRef GetNoteEmitter::VisitNode(const ExplodedNode *Node,
				BugReporterContext &BRC,
				PathSensitiveBugReport &BR) {

				ProgramStateRef State = Node->getState();
				const SVal *InnerPtr = getInnerPointer(State, ThisRegion);
				if (InnerPtr) {
				SVal InnerPtrVal = *InnerPtr;
				ProgramStateRef NullState, NonNullState;
				std::tie(NullState, NonNullState) =
				State->assume(InnerPtrVal.castAs<DefinedOrUnknownSVal>());

				// Check whether we have a constraint on ThisRegion
				if (NullState && NonNullState) {
				if (const Stmt *S = Node->getStmtForDiagnostics()) {
				// Skip if we already have covered this statement
				auto ItInsertedPair = StmtsCovered.insert(S);
				if (!ItInsertedPair.second)
				return nullptr;

				// If statement on raw pointer
				visitIfStmt(Node, State, BRC, S, InnerPtrVal);

				// Assignment operator
				if (auto Report =
				emitNoteForAssignment(Node, State, BRC, S, InnerPtrVal))
				return Report;

				// Variable declaration
				if (auto Report =
				emitNoteForInitialization(Node, State, BRC, S, InnerPtrVal))
				return Report;
				}
				}
				}
				return nullptr;
				}

				PathDiagnosticPieceRef GetNoteEmitter::emitNote(const ExplodedNode *Node,
				BugReporterContext &BRC,
				const Expr *E) {
				if (const auto *MCE = llvm::dyn_cast<CXXMemberCallExpr>(E)) {
				const auto *Method = MCE->getMethodDecl();
				const auto *Record = MCE->getRecordDecl();
				if (Method->getName() == "get" && isStdSmartPtr(Record)) {
				llvm::SmallString<128> Str;
				llvm::raw_svector_ostream OS(Str);
				if (ThisRegion->canPrintPretty()) {
				OS << "Obtained null inner pointer from ";
				ThisRegion->printPretty(OS);
				} else {
				OS << "Obtained null inner pointer here";
				}
				const Stmt *S = Node->getStmtForDiagnostics();
				PathDiagnosticLocation Pos(S, BRC.getSourceManager(),
				Node->getLocationContext());
				return std::make_shared<PathDiagnosticEventPiece>(Pos, OS.str(), true);
				}
				}
				return {};
				}

				bool GetNoteEmitter::visitIfStmt(const ExplodedNode *Node,
				const ProgramStateRef State,
				BugReporterContext &BRC, const Stmt *S,
				const SVal InnerPtrVal) {
				const auto *E = llvm::dyn_cast<ImplicitCastExpr>(S);
				if (!E \|\| E->getCastKind() != CastKind::CK_PointerToBoolean)
				return false;
				// Check if we are tracking the expression being casted
				const Expr *Sub = E->getSubExpr();
				SVal ExprVal = State->getSVal(Sub, Node->getLocationContext());
				if (ExprVal != InnerPtrVal)
				return false;
				// So we have a pointer being used in an if statement
				// And that pointer is being tracked to the same SVal as
				// ThisRegion Also it is at this if statement that the
				// constraining starts So we know that this pointer points to
				// P.get()
				if (const auto *Ptr = llvm::dyn_cast<DeclRefExpr>(Sub->IgnoreParenCasts())) {
				PtrsTrackedAndConstrained.insert(Ptr->getDecl());
				return true;
				}
				return false;
				}

				PathDiagnosticPieceRef GetNoteEmitter::emitNoteForAssignment(
				const ExplodedNode *Node, const ProgramStateRef State,
				BugReporterContext &BRC, const Stmt *S, const SVal InnerPtrVal) {
				const auto *BO = llvm::dyn_cast<BinaryOperator>(S);
				if (!BO \|\| BO->getOpcode() != BO_Assign)
				return nullptr;
				const Expr *LHS = BO->getLHS();
				const auto *DeclRef = llvm::dyn_cast<DeclRefExpr>(LHS);
				if (!DeclRef \|\| !PtrsTrackedAndConstrained.contains(DeclRef->getDecl()))
				return nullptr;
				const Expr *RHS = BO->getRHS();
				// So now we have an assignment of the form a = b,
				// where a is known to be tracked and constrained

				// If b is just a pointer, then we should add it to the set
				const Expr *Sub = RHS->IgnoreParenCasts();
				if (const auto *Ptr = llvm::dyn_cast<DeclRefExpr>(Sub)) {
				PtrsTrackedAndConstrained.insert(Ptr->getDecl());
				llvm::SmallString<128> Str;
				llvm::raw_svector_ostream OS(Str);
				OS << "Obtained null value here";
				PathDiagnosticLocation Pos(S, BRC.getSourceManager(),
				Node->getLocationContext());
				return std::make_shared<PathDiagnosticEventPiece>(Pos, OS.str(), true);
				}

				// If b is a get() expression, then we can return a note
				if (auto Report = emitNote(Node, BRC, RHS))
				return Report;

				return nullptr;
				}

				PathDiagnosticPieceRef GetNoteEmitter::emitNoteForInitialization(
				const ExplodedNode *Node, const ProgramStateRef State,
				BugReporterContext &BRC, const Stmt *S, const SVal InnerPtrVal) {
				const auto *DS = llvm::dyn_cast<DeclStmt>(S);
				if (!DS)
				return nullptr;
				const Decl *D = DS->getSingleDecl();
				assert(D && "DeclStmt should have at least one Decl");
				const auto *VD = llvm::dyn_cast<VarDecl>(D);
				if (!VD)
				return nullptr;
				for (const auto *I : PtrsTrackedAndConstrained) {
				if (I->getName() == VD->getName()) {
				const Expr *Init = VD->getAnyInitializer();
				if (!Init)
				continue;
				if (auto Report = emitNote(Node, BRC, Init))
				return Report;
				break;
				}
				}
				return nullptr;
				}

	} // namespace smartptr			} // namespace smartptr
	} // namespace ento			} // namespace ento
	} // namespace clang			} // namespace clang

	void SmartPtrChecker::checkPreCall(const CallEvent &Call,			void SmartPtrChecker::checkPreCall(const CallEvent &Call,
	CheckerContext &C) const {			CheckerContext &C) const {
	if (!smartptr::isStdSmartPtrCall(Call))			if (!smartptr::isStdSmartPtrCall(Call))
	return;			return;
	Show All 15 Lines
	void SmartPtrChecker::reportBug(CheckerContext &C, const MemRegion *DerefRegion,			void SmartPtrChecker::reportBug(CheckerContext &C, const MemRegion *DerefRegion,
	const CallEvent &Call) const {			const CallEvent &Call) const {
	ExplodedNode *ErrNode = C.generateErrorNode();			ExplodedNode *ErrNode = C.generateErrorNode();
	if (!ErrNode)			if (!ErrNode)
	return;			return;
	llvm::SmallString<128> Str;			llvm::SmallString<128> Str;
	llvm::raw_svector_ostream OS(Str);			llvm::raw_svector_ostream OS(Str);
	explainDereference(OS, DerefRegion, Call);			explainDereference(OS, DerefRegion, Call);
	auto R = std::make_unique<PathSensitiveBugReport>(NullDereferenceBugType,			auto R = std::make_unique<PathSensitiveBugReport>(NullDereferenceBugType,
				steakhalUnsubmitted Done Reply Inline Actions I'm not sure if we should expect 16 unique places where `uptr::get()` called on a path. I would guess 4 or 2 is more than enough. steakhal: I'm not sure if we should expect 16 unique places where `uptr::get()` called on a path. I would…
				RedDocMDAuthorUnsubmitted Done Reply Inline Actions Ok RedDocMD: Ok
	OS.str(), ErrNode);			OS.str(), ErrNode);
				R->addVisitor(std::make_unique<smartptr::GetNoteEmitter>(DerefRegion));
	R->markInteresting(DerefRegion);			R->markInteresting(DerefRegion);
	C.emitReport(std::move(R));			C.emitReport(std::move(R));
	}			}

	void SmartPtrChecker::explainDereference(llvm::raw_ostream &OS,			void SmartPtrChecker::explainDereference(llvm::raw_ostream &OS,
	const MemRegion *DerefRegion,			const MemRegion *DerefRegion,
	const CallEvent &Call) const {			const CallEvent &Call) const {
	OS << "Dereference of null smart pointer ";			OS << "Dereference of null smart pointer ";
	DerefRegion->printPretty(OS);			DerefRegion->printPretty(OS);
	}			}

	void ento::registerSmartPtrChecker(CheckerManager &Mgr) {			void ento::registerSmartPtrChecker(CheckerManager &Mgr) {
	SmartPtrChecker *Checker = Mgr.registerChecker<SmartPtrChecker>();			SmartPtrChecker *Checker = Mgr.registerChecker<SmartPtrChecker>();
	NullDereferenceBugTypePtr = &Checker->NullDereferenceBugType;			NullDereferenceBugTypePtr = &Checker->NullDereferenceBugType;
	}			}

	bool ento::shouldRegisterSmartPtrChecker(const CheckerManager &mgr) {			bool ento::shouldRegisterSmartPtrChecker(const CheckerManager &mgr) {
	const LangOptions &LO = mgr.getLangOpts();			const LangOptions &LO = mgr.getLangOpts();
	return LO.CPlusPlus;			return LO.CPlusPlus;
	}			}
				steakhalUnsubmitted Done Reply Inline Actions So you are trying to find the assignment, where the inner pointer is assigned to a variable. This visitor logic seems to be somewhat convoluted. What you want to achieve is slightly similar to `FindLastStoreBRVisitor`. You should have a look at that. steakhal: So you are trying to find the assignment, where the inner pointer is assigned to a variable.
				RedDocMDAuthorUnsubmitted Done Reply Inline Actions That is what I had done before. @NoQ pointed out why this wouldn't work in a previous comment. RedDocMD: That is what I had done before. @NoQ pointed out why this wouldn't work in a previous comment.
				steakhalUnsubmitted Done Reply Inline Actions Please elaborate on that. I'm not saying that an already existing visitor would perfectly fit your needs. I'm just curious why a similar logic would not work for you. You are trying to iterate over a bunch of decls and init exprs etc. And there is nothing similar to the visitor I mentioned. steakhal: Please elaborate on that. I'm not saying that an already existing visitor would perfectly fit…
				RedDocMDAuthorUnsubmitted Done Reply Inline Actions Sorry. I should have written it out better. So you are trying to find the assignment, where the inner pointer is assigned to a variable. Yes and no. I am indeed trying to find where the first assignment occurred, since re-assigning to the pointer obtained from `get()` doesn't give any information regarding regarding the smart pointer being null or not. So what this visitor does that it goes back to the original assignment to find out what SVal was bound to `a.get()`. The Environment doesn't have this info since it is garbage collected somewhere on the way. Also accessing this State allows me to check whether the SVal was null to begin with. I don't think `FindLastStoreBRVisitor` does this. RedDocMD: Sorry. I should have written it out better. > So you are trying to find the assignment, where…
				NoQUnsubmitted Done Reply Inline Actions I think it's too late to act when `.get()` is already happening. Like, we're visiting from bottom to top, in reverse to how our normal abstract interpretation goes. So if we only start tracking when we reach `.get()` we won't be able to explain what happens to the pointer between obtaining it from `.get()` and constraining it to null. For instance, if it was moved from one raw pointer variable to another, we won't put a note there, but we should. Let's try the following. Write a visitor to detect the moment of time when the raw pointer value is getting constrained to null. I.e., find when `State->assume(State->get<TrackedRegionMap>(ThisRegion))` stops working. It can be for two reasons: either the inner value is overwritten or it's constrained to null. If it's overwritten, track the newly set value and our job is done. If it's constrained to null, try to find out what's happening (is it an if-statement? is it an eagerly-assume action over a comparison operator?). We already have a common visitor that's good at figuring this out, maybe it'd be possible to reuse the code. In any case, start tracking the symbol and possibly emit a checker-specific note immediately ("raw pointer value constrained to null" or something like that). NoQ: I think it's too late to act when `.get()` is already happening. Like, we're visiting from…

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp

Show First 20 Lines • Show All 70 Lines • ▼ Show 20 Lines	private:

using SmartPtrMethodHandlerFn =		using SmartPtrMethodHandlerFn =
void (SmartPtrModeling::*)(const CallEvent &Call, CheckerContext &) const;		void (SmartPtrModeling::*)(const CallEvent &Call, CheckerContext &) const;
CallDescriptionMap<SmartPtrMethodHandlerFn> SmartPtrMethodHandlers{		CallDescriptionMap<SmartPtrMethodHandlerFn> SmartPtrMethodHandlers{
{{"reset"}, &SmartPtrModeling::handleReset},		{{"reset"}, &SmartPtrModeling::handleReset},
{{"release"}, &SmartPtrModeling::handleRelease},		{{"release"}, &SmartPtrModeling::handleRelease},
{{"swap", 1}, &SmartPtrModeling::handleSwap},		{{"swap", 1}, &SmartPtrModeling::handleSwap},
{{"get"}, &SmartPtrModeling::handleGet}};		{{"get"}, &SmartPtrModeling::handleGet}};

		vsavchenkoUnsubmitted Done Reply Inline Actions I think it's better to use `REGISTER_SET_FACTORY_WITH_PROGRAMSTATE` instead and keep factory as part of the state as well. vsavchenko: I think it's better to use `REGISTER_SET_FACTORY_WITH_PROGRAMSTATE` instead and keep factory as…
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions My bad, I should have put in the TODO. RedDocMD: My bad, I should have put in the TODO.
		vsavchenkoUnsubmitted Done Reply Inline Actions It is marked as "Done", but the code is same. vsavchenko: It is marked as "Done", but the code is same.
		// TODO: Get rid after D97183
		mutable llvm::ImmutableSet<const Expr *>::Factory StmtSetFactory;
};		};
} // end of anonymous namespace		} // end of anonymous namespace

REGISTER_MAP_WITH_PROGRAMSTATE(TrackedRegionMap, const MemRegion *, SVal)		REGISTER_MAP_WITH_PROGRAMSTATE(TrackedRegionMap, const MemRegion *, SVal)

		vsavchenkoUnsubmitted Done Reply Inline Actions Probably forgotten vsavchenko: Probably forgotten
		// TODO: Get rid of this onece D97183 is settled
		REGISTER_MAP_WITH_PROGRAMSTATE(ExprsFromGet, const MemRegion *,
		llvm::ImmutableSet<const Expr *>)

// Define the inter-checker API.		// Define the inter-checker API.
namespace clang {		namespace clang {
namespace ento {		namespace ento {
namespace smartptr {		namespace smartptr {
bool isStdSmartPtrCall(const CallEvent &Call) {		bool isStdSmartPtrCall(const CallEvent &Call) {
const auto *MethodDecl = dyn_cast_or_null<CXXMethodDecl>(Call.getDecl());		const auto *MethodDecl = dyn_cast_or_null<CXXMethodDecl>(Call.getDecl());
if (!MethodDecl \|\| !MethodDecl->getParent())		if (!MethodDecl \|\| !MethodDecl->getParent())
return false;		return false;

const auto *RecordDecl = MethodDecl->getParent();		const auto *RecordDecl = MethodDecl->getParent();
if (!RecordDecl \|\| !RecordDecl->getDeclContext()->isStdNamespace())		if (!RecordDecl \|\| !RecordDecl->getDeclContext()->isStdNamespace())
return false;		return false;

if (RecordDecl->getDeclName().isIdentifier()) {		if (RecordDecl->getDeclName().isIdentifier()) {
StringRef Name = RecordDecl->getName();		StringRef Name = RecordDecl->getName();
return Name == "shared_ptr" \|\| Name == "unique_ptr" \|\| Name == "weak_ptr";		return Name == "shared_ptr" \|\| Name == "unique_ptr" \|\| Name == "weak_ptr";
}		}
return false;		return false;
}		}

		bool isStdSmartPtr(const CXXRecordDecl *Rec) {
		vsavchenkoUnsubmitted Done Reply Inline Actions My guess is that it should be just `isStdSmartPtr` vsavchenko: My guess is that it should be just `isStdSmartPtr`
		if (!Rec \|\| !Rec->getDeclContext()->isStdNamespace())
		return false;
		if (Rec->getDeclName().isIdentifier()) {
		StringRef Name = Rec->getName();
		return Name == "shared_ptr" \|\| Name == "unique_ptr" \|\| Name == "weak_ptr";
		}
		return false;
		}

bool isNullSmartPtr(const ProgramStateRef State, const MemRegion *ThisRegion) {		bool isNullSmartPtr(const ProgramStateRef State, const MemRegion *ThisRegion) {
const auto *InnerPointVal = State->get<TrackedRegionMap>(ThisRegion);		const auto *InnerPointVal = State->get<TrackedRegionMap>(ThisRegion);
return InnerPointVal &&		return InnerPointVal &&
!State->assume(InnerPointVal->castAs<DefinedOrUnknownSVal>(), true);		!State->assume(InnerPointVal->castAs<DefinedOrUnknownSVal>(), true);
}		}

		// TODO: Get rid after D97183
		bool isExprObtainedFromGet(const ProgramStateRef State,
		const MemRegion ThisRegion, const Expr E) {
		const auto *ExprSet = State->get<ExprsFromGet>(ThisRegion);
		return ExprSet && ExprSet->contains(E);
		}

		const SVal *getInnerPointer(const ProgramStateRef State,
		const MemRegion *ThisRegion) {
		return State->get<TrackedRegionMap>(ThisRegion);
		}

} // namespace smartptr		} // namespace smartptr
} // namespace ento		} // namespace ento
} // namespace clang		} // namespace clang

// If a region is removed all of the subregions need to be removed too.		// If a region is removed all of the subregions need to be removed too.
static TrackedRegionMapTy		static TrackedRegionMapTy
removeTrackedSubregions(TrackedRegionMapTy RegionMap,		removeTrackedSubregions(TrackedRegionMapTy RegionMap,
TrackedRegionMapTy::Factory &RegionMapFactory,		TrackedRegionMapTy::Factory &RegionMapFactory,
const MemRegion *Region) {		const MemRegion *Region) {
if (!Region)		if (!Region)
return RegionMap;		return RegionMap;
for (const auto &E : RegionMap) {		for (const auto &E : RegionMap) {
if (E.first->isSubRegionOf(Region))		if (E.first->isSubRegionOf(Region))
RegionMap = RegionMapFactory.remove(RegionMap, E.first);		RegionMap = RegionMapFactory.remove(RegionMap, E.first);
		vsavchenkoUnsubmitted Done Reply Inline Actions We probably should have a checker in clang-tidy (maybe we already do), for situations like this. C++ has a long-lasting tradition that set's/map's `insert` return a pair: iterator + bool. The second part tells the user if the `insert` operation actually added the new element or it was already there. This allows us to do 1 search instead of two. vsavchenko: We probably should have a checker in clang-tidy (maybe we already do), for situations like this.
}		}
return RegionMap;		return RegionMap;
}		}

static ProgramStateRef updateSwappedRegion(ProgramStateRef State,		static ProgramStateRef updateSwappedRegion(ProgramStateRef State,
const MemRegion *Region,		const MemRegion *Region,
const SVal *RegionInnerPointerVal) {		const SVal *RegionInnerPointerVal) {
if (RegionInnerPointerVal) {		if (RegionInnerPointerVal) {
State = State->set<TrackedRegionMap>(Region, *RegionInnerPointerVal);		State = State->set<TrackedRegionMap>(Region, *RegionInnerPointerVal);
} else {		} else {
State = State->remove<TrackedRegionMap>(Region);		State = State->remove<TrackedRegionMap>(Region);
}		}
return State;		return State;
}		}

// Helper method to get the inner pointer type of specialized smart pointer		// Helper method to get the inner pointer type of specialized smart pointer
// Returns empty type if not found valid inner pointer type.		// Returns empty type if not found valid inner pointer type.
		vsavchenkoUnsubmitted Done Reply Inline Actions Functions are actions, it is better to express actions with verbs. Additionally, I don't think that it really reflects what it does without the verb either. vsavchenko: Functions are actions, it is better to express actions with verbs. Additionally, I don't think…
static QualType getInnerPointerType(const CallEvent &Call, CheckerContext &C) {		static QualType getInnerPointerType(const CallEvent &Call, CheckerContext &C) {
const auto *MethodDecl = dyn_cast_or_null<CXXMethodDecl>(Call.getDecl());		const auto *MethodDecl = dyn_cast_or_null<CXXMethodDecl>(Call.getDecl());
		vsavchenkoUnsubmitted Done Reply Inline Actions I'm still here picking on names 😅 It doesn't emit "bug report" it emits "note". And actually if you name your class as `GetNoteEmitter` or something similar you don't need to mention `get` here again. vsavchenko: I'm still here picking on names 😅 It doesn't emit "bug report" it emits "note". And actually if…
if (!MethodDecl \|\| !MethodDecl->getParent())		if (!MethodDecl \|\| !MethodDecl->getParent())
return {};		return {};

const auto *RecordDecl = MethodDecl->getParent();		const auto *RecordDecl = MethodDecl->getParent();
if (!RecordDecl \|\| !RecordDecl->isInStdNamespace())		if (!RecordDecl \|\| !RecordDecl->isInStdNamespace())
		vsavchenkoUnsubmitted Done Reply Inline Actions Is there any reason it's not a method of `FindWhereConstrained`? vsavchenko: Is there any reason it's not a method of `FindWhereConstrained`?
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions Not really. RedDocMD: Not really.
return {};		return {};
		vsavchenkoUnsubmitted Done Reply Inline Actions This type of predicates shouldn't be scattered throughout the code here and there. It should be definitely unified and put into a function that is shared with the checker and other parts of the model. One simple example here - what if we need to support other types of smart pointers? Should we go around all over the code looking for the places like this or fix it in one place? It also naturally creates a question "What about `shared_ptr`"? vsavchenko: This type of predicates shouldn't be scattered throughout the code here and there. It should…

		vsavchenkoUnsubmitted Done Reply Inline Actions After that you have 3 distinct cases to handle. It's a good opportunity for extracting them into separate functions. vsavchenko: After that you have 3 distinct cases to handle. It's a good opportunity for extracting them…
const auto *TSD = dyn_cast<ClassTemplateSpecializationDecl>(RecordDecl);		const auto *TSD = dyn_cast<ClassTemplateSpecializationDecl>(RecordDecl);
if (!TSD)		if (!TSD)
return {};		return {};

auto TemplateArgs = TSD->getTemplateArgs().asArray();		auto TemplateArgs = TSD->getTemplateArgs().asArray();
if (TemplateArgs.size() == 0)		if (TemplateArgs.size() == 0)
return {};		return {};
		NoQUnsubmitted Done Reply Inline Actions `SVal ExprVal = State->getSVal(Sub, Node->getLocationContext());`. NoQ: `SVal ExprVal = State->getSVal(Sub, Node->getLocationContext());`.
auto InnerValueType = TemplateArgs[0].getAsType();		auto InnerValueType = TemplateArgs[0].getAsType();
return C.getASTContext().getPointerType(InnerValueType.getCanonicalType());		return C.getASTContext().getPointerType(InnerValueType.getCanonicalType());
}		}

// Helper method to pretty print region and avoid extra spacing.		// Helper method to pretty print region and avoid extra spacing.
static void checkAndPrettyPrintRegion(llvm::raw_ostream &OS,		static void checkAndPrettyPrintRegion(llvm::raw_ostream &OS,
const MemRegion *Region) {		const MemRegion *Region) {
		vsavchenkoUnsubmitted Done Reply Inline Actions Just wondering if `return {};` will be sufficient here. vsavchenko: Just wondering if `return {};` will be sufficient here.
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions Yup RedDocMD: Yup
if (Region->canPrintPretty()) {		if (Region->canPrintPretty()) {
		vsavchenkoUnsubmitted Done Reply Inline Actions I think it's better to `IgnoreParensAndCasts` instead of manual traversal. vsavchenko: I think it's better to `IgnoreParensAndCasts` instead of manual traversal.
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions What is IgnoreParensAndCasts`? I didn't find it in the source code anywhere (ripgrep, that is). RedDocMD: What is IgnoreParensAndCasts`? I didn't find it in the source code anywhere (ripgrep, that is).
		teemperorUnsubmitted Done Reply Inline Actions Just a typo, the actual name is `IgnoreParenCasts` (Expr::IgnoreParenCasts) teemperor: Just a typo, the actual name is `IgnoreParenCasts` (Expr::IgnoreParenCasts)
OS << " ";		OS << " ";
Region->printPretty(OS);		Region->printPretty(OS);
}		}
}		}

bool SmartPtrModeling::isBoolConversionMethod(const CallEvent &Call) const {		bool SmartPtrModeling::isBoolConversionMethod(const CallEvent &Call) const {
// TODO: Update CallDescription to support anonymous calls?		// TODO: Update CallDescription to support anonymous calls?
// TODO: Handle other methods, such as .get() or .release().		// TODO: Handle other methods, such as .get() or .release().
// But once we do, we'd need a visitor to explain null dereferences		// But once we do, we'd need a visitor to explain null dereferences
		vsavchenkoUnsubmitted Not Done Reply Inline Actions I have two major questions about this implementation: Why don't we need an actual check for `IfStmt`? Won't it trigger on `bool unused = !pointer;`? And if so it won't mean constrained. Why do we only care about implicit pointer-to-bool conversion? What about situations like `pointer == nullptr`, `NULL != pointer`, `__builtin_expect(pointer, 0)`, etc? vsavchenko: I have two major questions about this implementation: - Why don't we need an actual check…
// that are found via such modeling.		// that are found via such modeling.
		vsavchenkoUnsubmitted Done Reply Inline Actions I think it's better to unite these two into `if (!E \|\| E->getCastKind()...)` vsavchenko: I think it's better to unite these two into `if (!E \|\| E->getCastKind()...)`
const auto *CD = dyn_cast_or_null<CXXConversionDecl>(Call.getDecl());		const auto *CD = dyn_cast_or_null<CXXConversionDecl>(Call.getDecl());
return CD && CD->getConversionType()->isBooleanType();		return CD && CD->getConversionType()->isBooleanType();
}		}

		NoQUnsubmitted Not Done Reply Inline Actions Your visitor doesn't need to track raw pointers through raw variables. `trackExpressionValue()` is fully capable of doing this. The problem only becomes checker-specific when the connection between smart pointers and raw pointers is involved. I meant assignments between two smart pointers. Maybe even reduce the visitor to only the if-statement case and have it check all interesting smart pointers instead of a specific smart pointer. Propagation of interestingness across smart pointers can be handled by note tags for move/copy assignmnets and constructors. NoQ: Your visitor doesn't need to track raw pointers through raw variables. `trackExpressionValue()`…
bool SmartPtrModeling::evalCall(const CallEvent &Call,		bool SmartPtrModeling::evalCall(const CallEvent &Call,
		vsavchenkoUnsubmitted Done Reply Inline Actions I guess it escaped during code refactoring: `SVal ExprVal = State->getSVal(Sub, Node->getLocationContext());` vsavchenko: I guess it escaped during code refactoring: `SVal ExprVal = State->getSVal(Sub, Node…
CheckerContext &C) const {		CheckerContext &C) const {
ProgramStateRef State = C.getState();		ProgramStateRef State = C.getState();
if (!smartptr::isStdSmartPtrCall(Call))		if (!smartptr::isStdSmartPtrCall(Call))
return false;		return false;

if (isBoolConversionMethod(Call)) {		if (isBoolConversionMethod(Call)) {
const MemRegion *ThisR =		const MemRegion *ThisR =
cast<CXXInstanceCall>(&Call)->getCXXThisVal().getAsRegion();		cast<CXXInstanceCall>(&Call)->getCXXThisVal().getAsRegion();

if (ModelSmartPtrDereference) {		if (ModelSmartPtrDereference) {
// The check for the region is moved is duplicated in handleBoolOperation		// The check for the region is moved is duplicated in handleBoolOperation
// method.		// method.
// FIXME: Once we model std::move for smart pointers clean up this and use		// FIXME: Once we model std::move for smart pointers clean up this and use
// that modeling.		// that modeling.
handleBoolConversion(Call, C);		handleBoolConversion(Call, C);
return true;		return true;
} else {		} else {
if (!move::isMovedFrom(State, ThisR)) {		if (!move::isMovedFrom(State, ThisR)) {
// TODO: Model this case as well. At least, avoid invalidation of		// TODO: Model this case as well. At least, avoid invalidation of
		vsavchenkoUnsubmitted Done Reply Inline Actions Similar note here: `if (!BO \|\| BO->getOpcode()...)` vsavchenko: Similar note here: `if (!BO \|\| BO->getOpcode()...)`
// globals.		// globals.
return false;		return false;
		vsavchenkoUnsubmitted Done Reply Inline Actions Variables are capitalized. vsavchenko: Variables are capitalized.
}		}
		vsavchenkoUnsubmitted Done Reply Inline Actions It is a widespread pattern in LLVM to declare such variables directly in `if` statements: if (auto Report = bugReportOnGet(RHS)) return Report; vsavchenko: It is a widespread pattern in LLVM to declare such variables directly in `if` statements: ```…
		vsavchenkoUnsubmitted Done Reply Inline Actions It's better to use `IgnoreParenCasts` here as well. vsavchenko: It's better to use `IgnoreParenCasts` here as well.

// TODO: Add a note to bug reports describing this decision.		// TODO: Add a note to bug reports describing this decision.
C.addTransition(State->BindExpr(		C.addTransition(State->BindExpr(
		vsavchenkoUnsubmitted Done Reply Inline Actions And here vsavchenko: And here
Call.getOriginExpr(), C.getLocationContext(),		Call.getOriginExpr(), C.getLocationContext(),
C.getSValBuilder().makeZeroVal(Call.getResultType())));		C.getSValBuilder().makeZeroVal(Call.getResultType())));

return true;		return true;
}		}
}		}

		vsavchenkoUnsubmitted Done Reply Inline Actions So, situations like `int a = nullptr, b = smart.get();` are not supported? vsavchenko: So, situations like `int a = nullptr, b = smart.get();` are not supported?
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions No it works even in that case (I have added a test for that). It's got to do with how the AST data structures are (`int a = nullptr, b = smart.get();` is considered a single decl). RedDocMD: No it works even in that case (I have added a test for that). It's got to do with how the AST…
if (!ModelSmartPtrDereference)		if (!ModelSmartPtrDereference)
return false;		return false;
		vsavchenkoUnsubmitted Done Reply Inline Actions `llvm::find_if` vsavchenko: `llvm::find_if`
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions Not sure if that'll work neatly since I actually need the return value of the predicate function (the report). RedDocMD: Not sure if that'll work neatly since I actually need the return value of the predicate…

if (const auto *CC = dyn_cast<CXXConstructorCall>(&Call)) {		if (const auto *CC = dyn_cast<CXXConstructorCall>(&Call)) {
if (CC->getDecl()->isCopyConstructor())		if (CC->getDecl()->isCopyConstructor())
return false;		return false;

const MemRegion *ThisRegion = CC->getCXXThisVal().getAsRegion();		const MemRegion *ThisRegion = CC->getCXXThisVal().getAsRegion();
if (!ThisRegion)		if (!ThisRegion)
return false;		return false;

if (CC->getDecl()->isMoveConstructor())		if (CC->getDecl()->isMoveConstructor())
return handleMoveCtr(Call, C, ThisRegion);		return handleMoveCtr(Call, C, ThisRegion);

if (Call.getNumArgs() == 0) {		if (Call.getNumArgs() == 0) {
		teemperorUnsubmitted Done Reply Inline Actions LLVM-code style mandates no curly braces around single-line ifs. teemperor: LLVM-code style mandates no curly braces around single-line ifs.
auto NullVal = C.getSValBuilder().makeNull();		auto NullVal = C.getSValBuilder().makeNull();
State = State->set<TrackedRegionMap>(ThisRegion, NullVal);		State = State->set<TrackedRegionMap>(ThisRegion, NullVal);

		vsavchenkoUnsubmitted Done Reply Inline Actions This level of nestedness is frowned upon. It is a good tell that the function should be refactored. The following code: if (cond1) { . . . if (cond2) { . . . if (cond3) { . . . } } } return nullptr; can be refactored into: if (!cond1) return nullptr; . . . if (!cond2) return nullptr; . . . if (!cond3) return nullptr; . . . It is easier to follow the logic if the function is composed in this manner because from the very beginning you know that `else` with more stuff is not going to follow. vsavchenko: This level of nestedness is frowned upon. It is a good tell that the function should be…
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions Do you still think that's the case now? (After breaking it into functions). I also think that for the sort of pattern matching we are doing in this patch, the nested if's make more sense. RedDocMD: Do you still think that's the case now? (After breaking it into functions). I also think that…
		NoQUnsubmitted Not Done Reply Inline Actions That's not what the assert is saying; the assert is saying that the `DeclStmt` has exactly one `Decl`. It basically forbids code like int x = 1, y = 2; . You may wonder why don't you crash all over the place. That's because Clang CFG creates its own `DeclStmt`s that aren't normally present in the AST, that always have exactly one declaration. This is necessary because there may be non-trivial control flow between these declarations (due to, say, presence of operator `?:` in the initializer) so they have to be represented as different elements (possibly even in different blocks) in the CFG. NoQ: That's not what the assert is saying; the assert is saying that the `DeclStmt` has //exactly//…
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions So I guess the tests at lines `317` and `378` of `smart-ptr-text-output.cpp` work because of the CFG messing with the AST? So should I remove the assert? RedDocMD: So I guess the tests at lines `317` and `378` of `smart-ptr-text-output.cpp` work because of…
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions @NoQ? RedDocMD: @NoQ?
		NoQUnsubmitted Not Done Reply Inline Actions So I guess the tests at lines 317 and 378 of smart-ptr-text-output.cpp work because of the CFG messing with the AST? Yes. The rest of the static analyzer works for the same reason; a lot of code relies on it. So should I remove the assert? The assert is correct but the message is wrong / misleading. NoQ: > So I guess the tests at lines 317 and 378 of smart-ptr-text-output.cpp work because of the…
C.addTransition(		C.addTransition(
State, C.getNoteTag([ThisRegion](PathSensitiveBugReport &BR,		State, C.getNoteTag([ThisRegion](PathSensitiveBugReport &BR,
llvm::raw_ostream &OS) {		llvm::raw_ostream &OS) {
if (&BR.getBugType() != smartptr::getNullDereferenceBugType() \|\|		if (&BR.getBugType() != smartptr::getNullDereferenceBugType() \|\|
!BR.isInteresting(ThisRegion))		!BR.isInteresting(ThisRegion))
return;		return;
OS << "Default constructed smart pointer";		OS << "Default constructed smart pointer";
checkAndPrettyPrintRegion(OS, ThisRegion);		checkAndPrettyPrintRegion(OS, ThisRegion);
OS << " is null";		OS << " is null";
}));		}));
		teemperorUnsubmitted Done Reply Inline Actions (I think this was already pointed out, but early-exits are the way to go in LLVM. const auto DS = llvm::dyn_cast<DeclStmt>(S)); if (!DS) return nullptr; const Decl D = DS->getSingleDecl(); const auto VD = llvm::dyn_cast<VarDecl>(D); if (!VD) return nullptr; .... `` teemperor:* (I think this was already pointed out, but early-exits are the way to go in LLVM. ``` const…
} else {		} else {
const auto *TrackingExpr = Call.getArgExpr(0);		const auto *TrackingExpr = Call.getArgExpr(0);
assert(TrackingExpr->getType()->isPointerType() &&		assert(TrackingExpr->getType()->isPointerType() &&
"Adding a non pointer value to TrackedRegionMap");		"Adding a non pointer value to TrackedRegionMap");
auto ArgVal = Call.getArgSVal(0);		auto ArgVal = Call.getArgSVal(0);
State = State->set<TrackedRegionMap>(ThisRegion, ArgVal);		State = State->set<TrackedRegionMap>(ThisRegion, ArgVal);

C.addTransition(State, C.getNoteTag([ThisRegion, TrackingExpr,		C.addTransition(State, C.getNoteTag([ThisRegion, TrackingExpr,
▲ Show 20 Lines • Show All 194 Lines • ▼ Show 20 Lines	void SmartPtrModeling::handleGet(const CallEvent &Call,
if (!IC)		if (!IC)
return;		return;

const MemRegion *ThisRegion = IC->getCXXThisVal().getAsRegion();		const MemRegion *ThisRegion = IC->getCXXThisVal().getAsRegion();
if (!ThisRegion)		if (!ThisRegion)
return;		return;

SVal InnerPointerVal;		SVal InnerPointerVal;
		const auto *CallExpr = Call.getOriginExpr();
		steakhalUnsubmitted Not Done Reply Inline Actions Nit: Declare the variable as close to the usage as you can. In the narrowest scope as well. steakhal: Nit: Declare the variable as close to the usage as you can. In the narrowest scope as well.
if (const auto *InnerValPtr = State->get<TrackedRegionMap>(ThisRegion)) {		if (const auto *InnerValPtr = State->get<TrackedRegionMap>(ThisRegion)) {
InnerPointerVal = *InnerValPtr;		InnerPointerVal = *InnerValPtr;
} else {		} else {
const auto *CallExpr = Call.getOriginExpr();
InnerPointerVal = C.getSValBuilder().conjureSymbolVal(		InnerPointerVal = C.getSValBuilder().conjureSymbolVal(
CallExpr, C.getLocationContext(), Call.getResultType(), C.blockCount());		CallExpr, C.getLocationContext(), Call.getResultType(), C.blockCount());
State = State->set<TrackedRegionMap>(ThisRegion, InnerPointerVal);		State = State->set<TrackedRegionMap>(ThisRegion, InnerPointerVal);
}		}

State = State->BindExpr(Call.getOriginExpr(), C.getLocationContext(),		State = State->BindExpr(Call.getOriginExpr(), C.getLocationContext(),
InnerPointerVal);		InnerPointerVal);
// TODO: Add NoteTag, for how the raw pointer got using 'get' method.
C.addTransition(State);		// TODO: Get rid of this onece D97183 is settled
		// Store the CallExpr as being obtained through get. It will be necessary in
		// isExprObtainedFromGet().
		const auto *ExistingSet = State->get<ExprsFromGet>(ThisRegion);
		auto BaseSet = ExistingSet ? *ExistingSet : StmtSetFactory.getEmptySet();
		steakhalUnsubmitted Not Done Reply Inline Actions Why don't you 'save' the MemRegion only if the inner pointer is not proven to be null. This would relieve you from checking it later. Nit: I don't like such if branches. The last statement is identical, which is a code smell. It's better to think of this as a function taking stuff and producing a State. An immediately called lambda expression would enclose any local variables and it would suggest that the algorithm it implements is self-contained. I know that I'm the only immediately called lambda expression fan though. steakhal: Why don't you 'save' the MemRegion only if the inner pointer is not proven to be null. This…
		auto NewStmtSet = StmtSetFactory.add(BaseSet, CallExpr);
		State = State->set<ExprsFromGet>(ThisRegion, NewStmtSet);
		steakhalUnsubmitted Done Reply Inline Actions You should emit a sort of grammatically correct diagnostic message even if the region can not be pretty-printed. steakhal: You should emit a //sort of grammatically correct// diagnostic message even if the region can…
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions You should emit a sort of grammatically correct diagnostic message even if the region can not be pretty-printed. @steakhal Does it look better now? RedDocMD: > You should emit a //sort of grammatically correct// diagnostic message even if the region can…
		steakhalUnsubmitted Done Reply Inline Actions Can we cover both branches with tests? steakhal: Can we cover both branches with tests?
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions I am not sure if this can be done. Because, right now the only Smart Pointer that has been handled is `std::unique_ptr`. It turns out that it can be pretty printed already, so I don't know how to test the other branch. RedDocMD: I am not sure if this can be done. Because, right now the only Smart Pointer that has been…

		C.addTransition(State); // The note is added later by a visitor.
}		}
		vsavchenkoUnsubmitted Done Reply Inline Actions I generally don't like repeating code in both branches of the `if` statement. Here they share the following logic: add `CallExpr` and update the state. We can easily update the code to share the code: const auto ExistingSet = State->get<ExprsFromGet>(ThisRegion); auto BaseSet = ExistingSet ? ExistingSet : StmtSetFactory.getEmptySet(); auto NewStmtSet = StmtSetFactory.add(BaseSet, CallExpr); State = State->set<ExprsFromGet>(ThisRegion, NewStmtSet); vsavchenko: I generally don't like repeating code in both branches of the `if` statement. Here they share…
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions Right, thanks :) RedDocMD: Right, thanks :)

bool SmartPtrModeling::handleAssignOp(const CallEvent &Call,		bool SmartPtrModeling::handleAssignOp(const CallEvent &Call,
CheckerContext &C) const {		CheckerContext &C) const {
ProgramStateRef State = C.getState();		ProgramStateRef State = C.getState();
const auto *OC = dyn_cast<CXXMemberOperatorCall>(&Call);		const auto *OC = dyn_cast<CXXMemberOperatorCall>(&Call);
if (!OC)		if (!OC)
return false;		return false;
OverloadedOperatorKind OOK = OC->getOverloadedOperator();		OverloadedOperatorKind OOK = OC->getOverloadedOperator();
▲ Show 20 Lines • Show All 177 Lines • Show Last 20 Lines

clang/test/Analysis/smart-ptr-text-output.cpp

Show First 20 Lines • Show All 300 Lines • ▼ Show 20 Lines	int callingFooWithUnknownPointer(std::unique_ptr<int> PUnknown) {
foo(); // No note on Calling 'S::foo'		foo(); // No note on Calling 'S::foo'
P.reset(new int(0)); // expected-note {{Assigning 0}}		P.reset(new int(0)); // expected-note {{Assigning 0}}
return 1 / *(P.get()); // expected-warning {{Division by zero [core.DivideZero]}}		return 1 / *(P.get()); // expected-warning {{Division by zero [core.DivideZero]}}
// expected-note@-1 {{Division by zero}}		// expected-note@-1 {{Division by zero}}
}		}
};		};

void derefAfterBranchingOnUnknownInnerPtr(std::unique_ptr<A> P) {		void derefAfterBranchingOnUnknownInnerPtr(std::unique_ptr<A> P) {
A *RP = P.get();		A *RP = P.get(); // expected-note {{Obtained null inner pointer from 'P'}}
if (!RP) { // expected-note {{Assuming 'RP' is null}}		if (!RP) { // expected-note {{Assuming 'RP' is null}}
// expected-note@-1 {{Taking true branch}}		// expected-note@-1 {{Taking true branch}}
P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}		P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}
// expected-note@-1{{Dereference of null smart pointer 'P'}}		// expected-note@-1{{Dereference of null smart pointer 'P'}}
}		}
}		}

		void multpleDeclsWithGet(std::unique_ptr<A> P) {
		A dummy1 = nullptr, RP = P.get(), *dummy2; // expected-note {{Obtained null inner pointer from 'P'}}
		if (!RP) { // expected-note {{Assuming 'RP' is null}}
		// expected-note@-1 {{Taking true branch}}
		P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}
		// expected-note@-1{{Dereference of null smart pointer 'P'}}
		}
		}

		void multipleGetsShouldNotAllHaveNotes(std::unique_ptr<A> P) {
		A *RP = P.get(); // expected-note {{Obtained null inner pointer from 'P'}}
		A *dummy1 = P.get();
		A *dummy2 = P.get();
		if (!RP) { // expected-note {{Assuming 'RP' is null}}
		// expected-note@-1 {{Taking true branch}}
		P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}
		// expected-note@-1{{Dereference of null smart pointer 'P'}}
		}
		}

		void getShouldNotAlwaysLeaveANote() {
		std::unique_ptr<A> P; // expected-note {{Default constructed smart pointer 'P' is null}}
		A *a = P.get();
		P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}
		// expected-note@-1{{Dereference of null smart pointer 'P'}}
		}

		void getShouldNotLeaveANoteAfterReset(std::unique_ptr<A> P) {
		A *a = P.get();
		P.reset(); // expected-note {{Smart pointer 'P' reset using a null value}}
		P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}
		// expected-note@-1{{Dereference of null smart pointer 'P'}}
		}

		void getShouldNotLeaveNoteWhenPtrNotUsed(std::unique_ptr<A> P) {
		A *a = P.get();
		if (!P) { // expected-note {{Taking true branch}}
		// expected-note@-1 {{Assuming smart pointer 'P' is null}}
		P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}
		// expected-note@-1{{Dereference of null smart pointer 'P'}}
		}
		}

		void getShouldLeaveANoteWithWhileLoop(std::unique_ptr<A> P) {
		A *RP = P.get(); // expected-note {{Obtained null inner pointer from 'P'}}
		while (!RP) { // expected-note {{Assuming 'RP' is null}}
		// expected-note@-1 {{Loop condition is true. Entering loop body}}
		P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}
		// expected-note@-1{{Dereference of null smart pointer 'P'}}
		}
		}

		void getShouldLeaveANoteWithForLoop(std::unique_ptr<A> P) {
		for (A *RP = P.get(); !RP;) { // expected-note {{Assuming 'RP' is null}}
		// expected-note@-1 {{Loop condition is true. Entering loop body}}
		// expected-note@-2 {{Obtained null inner pointer from 'P'}}
		P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}
		// expected-note@-1{{Dereference of null smart pointer 'P'}}
		}
		}

		void getShouldLeaveNoteOnChaining(std::unique_ptr<A> P) {
		A praw = P.get(), other; // expected-note {{Obtained null inner pointer from 'P'}}
		other = praw; // expected-note {{Obtained null value here}}
		if (!other) { // expected-note {{Assuming 'other' is null}}
		// expected-note@-1 {{Taking true branch}}
		P->foo(); // expected-warning {{Dereference of null smart pointer 'P' [alpha.cplusplus.SmartPtr]}}
		// expected-note@-1{{Dereference of null smart pointer 'P'}}
		}
		}
		NoQUnsubmitted Done Reply Inline Actions Looks like your git history is acting up. Your patch adds this test right? Are there more proposed changes in the cpp files that aren't currently highlighted for a similar reason? I'll try to play with your patch locally once this is fixed ^.^ NoQ: Looks like your git history is acting up. Your patch adds this test right? Are there more…
		RedDocMDAuthorUnsubmitted Done Reply Inline Actions Yeah I seem to have tripped over the single commit rule. It should be fixed now. RedDocMD: Yeah I seem to have tripped over the single commit rule. It should be fixed now.
		No newline at end of file

This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add NoteTag for smart-ptr get()Needs ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 349332

clang/lib/StaticAnalyzer/Checkers/SmartPtr.h

clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp

clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp

clang/test/Analysis/smart-ptr-text-output.cpp

[analyzer] Add NoteTag for smart-ptr get()
Needs ReviewPublic