Removed a TODO by adding a note-tag to explain where the
raw pointer came from.
Details
- Reviewers
- NoQ - vsavchenko - dcoughlin - xazax.hun - teemperor 
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
Looks fine to me.
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
|---|---|---|
| 466–467 | You should emit a sort of grammatically correct diagnostic message even if the region can not be pretty-printed. | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
|---|---|---|
| 466–467 | Can we cover both branches with tests? | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
|---|---|---|
| 466–467 | I am not sure if this can be done. Because, right now the only Smart Pointer that has been handled is std::unique_ptr. It turns out that it can be pretty printed already, so I don't know how to test the other branch. | |
Nice!
I suspect you're adding too many notes. The note needs to not be there if the *raw* pointer is not tracked. Eg., I suspect that your patch would add a note in the following case in which it shouldn't be there because the raw pointer value doesn't participate in the report despite smart pointer region being interesting:
std::unique_ptr<A> P; A *a = P.get(); // shound't emit a note here P->foo();
It's important to not emit redundant notes because users typically take these checker-specific notes as an indication that this information is an essential piece of evidence of the bug in their program. In this example they'd believe that the analyzer has figured out that the smart pointer is null by looking at what happens to the raw pointer value. So they may become very confused if this isn't the case.
@NoQ, in the example you have given, isn't the smart-pointer P null? So shouldn't a warning be emitted for de-referencing it? Or is it that since a is not being used, a warning shouldn't be emitted? Sorry, I could not quite understand that point ...
The warning should be emitted but it shouldn't have a note at P.get() telling the user that an inner pointer was obtained.
@NoQ, I guess I would need to figure out a way to find out if the raw pointer obtained from get() is being used somewhere or is being constrained. I am trying to first figure out whether the raw pointer is being constrained to null, causing a null-deref to detected.
If the inner pointer participates in a branch condition guarding the dereference, that memory region is gotta be important, right? So, we should mark it so.
A BugreportVisitor could easily transfer the information about the fact that the dereference was guarded by that particular branch condition - and marking the InnerPointerVal (MemRegion) important.
This way the NoteTag for the get() could emit the warning.
The other approach, you @RedDocMD proposed about checking the constraint for the inner pointer, seems somewhat odd to me.
It could work, but I think the visitor is cleaner.
TBH, I don't like my approach either. I feel that it leaves out some cases.
The InnerPointerVal memory region is not marked as interesting as of now, I have tried that out. The branch condition constraint is set by the ConstraintManager and it is queried via in the State in the method smartptr::isNullSmartPtr at SmartPtrModelling.cpp:104. I have to see if the ConstraintManager can mark the memory region as important. @steakhal, @NoQ what do you think?
Hm, I don't think you can make this work.
The deref bug is reported only if the smartptr::isNullSmartPtr(State, ThisRegion) is true. Which is only true if the InnerPointVal is known to be null. So the information on how we get to know that the smart pointer is null is already lost.
From this perspective, I don't think you have any other choice than to walk back from the bug to the root using a bugreport visitor - and check whether or not the inner pointer is used in a branch condition.
I might be wrong about this, since this was the first time I had a deeper look at the SmartPtrChecker.
The TaintBugVisitor could give you a hint on how to implement this.
In some cases BugReport.isInteresting(InnerPointerVal.getAsSymbol()) would yield us exactly what we want. But if there's no symbol we have no choice but to interact with the trackExpressionValue facility and this may turn out to be pretty challenging.
We could, for instance, teach it to mark exploded nodes as interesting when it changes tracking mode. That'd be a completely new form of interestingness for us to track. Or maybe pairs (exploded node, expression) so that to be more specific. Then we could query it from our note tag.
I am trying to use scan-build on a file to see what sort of errors are reported:
./llvm-project/release/bin/scan-build -o . -enable-checker alpha.cplusplus.SmartPtr -analyzer-config alpha.cplusplus.SmartPtrModelling:ModelSmartPtrDereference=true  clang++ -c uniq_deref.cpp.
@NoQ, @vsavchenko why does this not work?
why does this not work?
How does this not work? What does it say?
what is tracking mode for an ExplodedNode?
Mmm yeah, in hindsight i should have explained it much more.
First of all, we have "bug visitors". They are the entity that adds notes to bug reports. Note tags are a new way to add notes to the bug report but it's still the same visitors under the hood, i.e. there's visitor that scans note tags and invokes their callback to produce notes. These visitors scan the report from bottom to top. They typically emit note whenever something changes in program state. For instance, if an interesting pointer symbol in the program state changes its status from "allocated" to "release" (i.e., the visitor was so far only seeing nodes in which it was released but now it encounters the first node in which it's not released yet) then it adds a note "pointer was released" which is relevant for use-after-free warnings.
Now, sometimes we have to emit notes with respect to values that aren't symbols. For instance, in null dereference bugs we have to explain the movement of a value that's a plain and simple null pointer. Unlike symbols who each have unique identity that captures their backstory and properties, such "concrete" values are indistinguishable from each other. If we see a null in the older state and a null in a newer state, we can't tell if it's the same null or a different null. This makes it much harder to explain the journey a specific null value has undertaken in order to end up in our pointer that we've ended up dereferencing.
This is where trackExpressionValue comes in. It knows how to track concrete values such as Null or Undefined. The way it works is that it tracks something in the state that corresponds to that null value but does have an identity, typically either memory regions ("the null pointer is currently stored in this variable") or expressions ("the null pointer is currently being returned from this call expression"). Neither memory regions nor (especially!) expressions are guaranteed to stay the same throughout the entire journey so we have to skip from one to the other in order to keep tracking our null pointer (say, "the null pointer was returned from a call expression which acts as an initializer to a variable; now we can stop tracking the variable and start tracking the call expression"). This is what I referred to as changing modes. It's also very clear from the static analyzer code where the mode changes: namely, trackExpressionValue is not a single visitor but a combination of visitors that recursively attach new instances of themselves to the report as they ascend it. For instance, in the above example a visitor that tracks a variable would finish and attach a new visitor that tracks a call expression.
So basically i suspect that the act of reattaching the visitor could be documented through interestingness for your checker to pick up. That would allow you to query whether the call-expression P.get() returns an interesting null value as opposed to a dull, ordinary null value that's unrelated to the report.
@NoQ, looking through the source code of trackExpressionValue I can see that it adds many visitors to the BugReport passed to it. That I believe is the recursive attachment of visitors you described above.
So, as far as I understood, I have to make changes in this function to mark an ExplodedNode as interesting when it changes tracking mode. This change is marked by when recursively a new visitor is attached (or at least in some of those places, the exact places will have to be figured out).
Then this can be queried from the checker to obtain the information that is needed.
Am I thinking on the right track?
I can see that it adds many visitors to the BugReport passed to it.
Yes and some of these visitors will call trackExpressionValue() again in their Visit...() functions which corresponds to adding visitors in the middle of visitation which is arguably the most interesting part.
@NoQ, I am using trackExpressionValue to add intersetingness to the InnerPointerVal. That solves the original problem. However, it is causing the MoveChecker to add extra warnings to use-after-move cases. Essentially, when a unique_ptr is moved and subsequently used, it triggers two warnings - one from SmartPointerModelling and another from MoveChecker. It seems to me that two separate checkers are tracking the same bug - use after move. 
So should I make another patch to modify SmartPointerModelling to not emit warnings on use after move (instead just leaving the GDM updating code)? Or is there a better solution to this?
@NoQ, sorry for the absurdly dumb mistake. Not entirely sure what I was thinking.
Can you please have a look at it now?
By tracking the call-expression you're basically tracking the raw pointer value because that's what operators * and -> return. Of course operator * returns an lvalue reference rather than a pointer but we don't make a difference when it comes to SVal representation.
So you're saying that simply by always tracking the (final) raw pointer value and checking whether the raw value is interesting upon .get() you dodge the communication problem entirely. I think this is quite a statement! I'd like a stronger evidence for that than passing a couple of tests. Does the following test work?:
void test(std::unique_ptr<A> P) { A *a = P.get(); // unlike your positive test this doesn't deserve a note // because we weren't looking at 'a' when we concluded // that the pointer is null if (!P) { P->foo(); } }
Essentially, when a unique_ptr is moved and subsequently used, it triggers two warnings - one from SmartPointerModelling and another from MoveChecker
Do i understand correctly that this doesn't happen anymore when you stopped creating a new node?
Does the following test work?
@NoQ, it seems to be working.
So you're saying that simply by always tracking the (final) raw pointer value and checking whether the raw value is interesting upon .get() you dodge the communication problem entirely
I would not say it has been dodged, but rather that problem had already been solved by trackExpressionValue. At line 1949 of BugReporterVisitors.cpp (inside the trackExpressionValue function) is:
if (LVState->getAnalysisManager().getAnalyzerOptions().ShouldTrackConditions)
    report.addVisitor(std::make_unique<TrackControlDependencyCondBRVisitor>(
          InputNode));Approximately, TrackControlDependencyCondBRVisitor is a visitor that looks into condition statements and via mutual recursion with trackExpressionValue marks SVal's as interesting if they are used in a condition and that condition constrains the Expr on which the visitor was originally called on. This gave me the idea that calling trackExpressionValue is all that we really need to do, since it already contains a visitor to discover the interestingness we need. Looking into this function made me feel that trackExpressionValue is actually a very powerful function which solves a lot of these communication problems.
Do i understand correctly that this doesn't happen anymore when you stopped creating a new node?
Yes, and I found out my blunder after staring at the exploded graph dump. Creating a new node was un-necessary since trackExpressionValue needs a node corresponding to the expression where we find the bug, and that was already being created above.
I did not follow the discussion closely but we (CodeChecker team) might have a similar problem.
Consider this: https://godbolt.org/z/835P38
int do_bifurcation(int p) { return p < 0; } int b(int x, int y) { int tmp = 13 / y; // y can't be 0. (void)tmp; int p0 = do_bifurcation(x); // There is a path where p0 is 0. int div = p0 * y; // So, div also becomes 0 on that path. return 1 / div; }
However, the bugreport tells us that you do a division by zero, which was initialized a line above.
Do you think it is a related issue @NoQ?
No-no, TrackControlDependencyCondBRVisitor's purpose is completely different. It tracks symbols that didn't necessarily participate in the report but participated in conditional statements that lexically surround the report. It's used for explaining how did we got ourselves into a given lexical spot but it doesn't explain why is this a problem.
We can get it out of the way:
A *a = P.get(); // no note expected
if (!P) {}
P->foo();vs.
A *a = P.get(); // expected note
if (!a) {}
P->foo();I suspect that it may still work and the reason this works is because we're not collapsing .get()'s return value to null when it's constrained to null. Given that the only interesting thing that could happen to the return value of .get() is getting constrained to null (because it's an rvalue, the programmer can't use it to overwrite the raw pointer value inside the pointer), it's either already null or you'd see the constraint tracked once you mark the symbol as interesting.
The reason i'd still not like this solution is because collapsing the symbol to null on .get() (if it's already constrained to null) is arguably the preferred behavior as it makes constraint solver's life easier. But that'd most likely break your tracking solution.
I don't see any smart pointers or null dereferences so... no? But there's definitely an issue with tracking in your example. It's definitely correct that div is initialized with zero on the path on which p0 is zero but the events that led to p0 being zero are not explained which is a bug that needs to be fixed.
| clang/test/Analysis/smart-ptr-text-output.cpp | ||
|---|---|---|
| 317 | Looks like your git history is acting up. Your patch adds this test right? Are there more proposed changes in the cpp files that aren't currently highlighted for a similar reason? I'll try to play with your patch locally once this is fixed ^.^ | |
| clang/test/Analysis/smart-ptr-text-output.cpp | ||
|---|---|---|
| 317 | Yeah I seem to have tripped over the single commit rule. It should be fixed now. | |
@NoQ, why does the following trigger a null-dereference warning? (https://godbolt.org/z/Kxox8qd16)
void g(std::unique_ptr<A> a) {
  A *aptr = a.get();
  if (!aptr) {}
  a->foo();
}When a->foo() is called, the constraint !aptr is no longer valid and so InnerPointerVal corresponding to a is no longer constrained to be null.
Am I missing something?
@NoQ, I have taken a different approach this time. I have used a visitor and am storing some more data in the GDM. Together, they distinguish between the following three cases:
- If the raw pointer obtained from get() is constrained to null in a path which leads to a Node (and thus State) where a smart-pointer-null-deref bug occurs.
- If the raw pointer was null to begin with (because the smart-pointer was null)
- If the raw pointer was not null to begin with but the smart-ptr became null after that.
Only in the first case should the note be emitted. I have added some more tests to that effect.
Can you please have a look at this?
When the if's condition is evaluated, it probably triggered a state split. On one path the aptr (aka. the inner pointer) will be constrained to null.
The only way to be sure is by checking the exploded graph and see where it goes.
All in all, I see where it's going. I don't know, it might be the right thing to do. I haven't spent much time on this topic though.
See my inline comments.
| clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp | ||
|---|---|---|
| 87 | I'm not sure if we should expect 16 unique places where uptr::get() called on a path. I would guess 4 or 2 is more than enough. | |
| 106–109 | So you are trying to find the assignment, where the inner pointer is assigned to a variable. What you want to achieve is slightly similar to FindLastStoreBRVisitor. You should have a look at that. | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
| 449 | Nit: Declare the variable as close to the usage as you can. In the narrowest scope as well. | |
| 464–465 | Why don't you 'save' the MemRegion only if the inner pointer is not proven to be null. This would relieve you from checking it later. Nit: I don't like such if branches. The last statement is identical, which is a code smell. | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp | ||
|---|---|---|
| 106–109 | Please elaborate on that. | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp | ||
|---|---|---|
| 106–109 | Sorry. I should have written it out better. 
 Yes and no. I am indeed trying to find where the first assignment occurred, since re-assigning to the pointer obtained from get() doesn't give any information regarding regarding the smart pointer being null or not. So what this visitor does that it goes back to the original assignment to find out what SVal was bound to a.get(). The Environment doesn't have this info since it is garbage collected somewhere on the way. Also accessing this State allows me to check whether the SVal was null to begin with. | |
Yes, that's, like, the whole point. We report unchecked use after a check. If the pointer is never null, why check? If the pointer is sometimes null, why use without a check? The code clearly doesn't make sense. That's what the report says: "assuming 'aptr' is null, there's null dereference on the next line". Once our simulation leaves the lexical scope of the if-condition 'aptr' doesn't suddenly become non-null. Technically speaking, we have no notion of a merge at all. There is literally no merge operation defined on our ProgramState, we do not perform fixpoint iteration, we do not try to combat path explosion, we simply simulate the behavior of the program on a few possible execution paths (defined by branching in the program) and report potential issues.
| clang/lib/StaticAnalyzer/Checkers/SmartPtrChecker.cpp | ||
|---|---|---|
| 106–109 | I think it's too late to act when .get() is already happening. Like, we're visiting from bottom to top, in reverse to how our normal abstract interpretation goes. So if we only start tracking when we reach .get() we won't be able to explain what happens to the pointer between obtaining it from .get() and constraining it to null. For instance, if it was moved from one raw pointer variable to another, we won't put a note there, but we should. Let's try the following. Write a visitor to detect the moment of time when the raw pointer value is getting constrained to null. I.e., find when State->assume(State->get<TrackedRegionMap>(ThisRegion)) stops working. It can be for two reasons: either the inner value is overwritten or it's constrained to null. If it's overwritten, track the newly set value and our job is done. If it's constrained to null, try to find out what's happening (is it an if-statement? is it an eagerly-assume action over a comparison operator?). We already have a common visitor that's good at figuring this out, maybe it'd be possible to reuse the code. In any case, start tracking the symbol and possibly emit a checker-specific note immediately ("raw pointer value constrained to null" or something like that). | |
Right, sorry for the late reply, @NoQ.
I will get to it once I get these assignments off my head.
For the following function:
void foo(std::unique_ptr<A> P) {
  A* praw = P.get();
  A* other = praw;
  if (other) {}
  P->foo();
}Where do we expect a note? Where praw is initialized, where other is initialized or both?
According to the existing analyzer logic, there is a bug. If you check other for null, we can conclude that there are circumstances when it is null indeed.
I think we can conclude that P must be non-null (since it was unconditionally dereferenced), thus the previous check on the inner pointer and the branch it guards must be dead! This fact deserves a report, you are right. My bad.
In this case, the report should show how the inner pointer got bound to the other. Thus, we should highlight both assignments.
Under the same logic we also can't report null dereference in the following code:
void bar() { A *p = nullptr; p->foo(); }
Indeed, the null pointer p is unconditionally dereferenced, therefore the entire function bar() must be dead!
Or maybe the entire executable binary into which this code is linked is never run. Some users definitely complain about static analyzer analyzing code that was entirely dead from the start, suggested integrating with the dynamic PGO facilities to analyze hot code first.
It's important to realize that with pure static analysis it is absolutely impossible to reliably report a bug more severe than dead code. Any form of static analysis only ever finds code that doesn't make sense. It cannot make assumptions about how often the code is executed in practice or how severe and impactful the bug is to the users of the program under analysis. When we report anything that doesn't directly scream "dead code", like null dereference, we're still always implicitly saying "This code doesn't make sense because it either has dead parts or _____". In fact we should probably do a better job at managing expectations because users do become upset when we promise them use-after-frees but in reality only find dead code that "would have caused use-after-frees if it was ever run".
It's important to realize that with pure static analysis it is absolutely impossible to reliably report a bug more severe than dead code. Any form of static analysis only ever finds code that doesn't make sense. It cannot make assumptions about how often the code is executed in practice or how severe and impactful the bug is to the users of the program under analysis. When we report anything that doesn't directly scream "dead code", like null dereference, we're still always implicitly saying "This code doesn't make sense because it either has dead parts or _____". In fact we should probably do a better job at managing expectations because users do become upset when we promise them use-after-frees but in reality only find dead code that "would have caused use-after-frees if it was ever run".
Tbh, given how loose of a memory model we are dealing with (at its worst, it is the C memory model), I think the static-analyzer does a great job at detecting what it possibly can. As for the user's expectation, I think we just need to wait for more adoption of the static analyzer. Then users will know exactly what to expect (we do not yell at the C++ compiler for not preventing use after delete, we do not yell at the Rust compiler for not allowing mismatching lifetimes - we know what to expect, and just work with that).
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
|---|---|---|
| 166–168 | SVal ExprVal = State->getSVal(Sub, Node->getLocationContext());. | |
| 186–190 | Your visitor doesn't need to track raw pointers through raw variables. trackExpressionValue() is fully capable of doing this. The problem only becomes checker-specific when the connection between smart pointers and raw pointers is involved. I meant assignments between two smart pointers. Maybe even reduce the visitor to only the if-statement case and have it check all interesting smart pointers instead of a specific smart pointer. Propagation of interestingness across smart pointers can be handled by note tags for move/copy assignmnets and constructors. | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtr.h | ||
|---|---|---|
| 36 ↗ | (On Diff #346633) | nit: class name should be a noun (functions and methods are verbs) | 
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
| 83 | Probably forgotten | |
| 132–135 | We probably should have a checker in clang-tidy (maybe we already do), for situations like this. | |
| 137–159 | Is there any reason it's not a method of FindWhereConstrained? | |
| 161 | After that you have 3 distinct cases to handle. It's a good opportunity for extracting them into separate functions. | |
| 175–176 | I think it's better to IgnoreParensAndCasts instead of manual traversal. | |
| 212 | Variables are capitalized. | |
| 213 | It is a widespread pattern in LLVM to declare such variables directly in if statements: if (auto Report = bugReportOnGet(RHS)) return Report; | |
| 223 | So, situations like int *a = nullptr, *b = smart.get(); are not supported? | |
| 225 | llvm::find_if | |
| 234–241 | This level of nestedness is frowned upon. It is a good tell that the function should be refactored. if (cond1) {
  . . .
  if (cond2) {
    . . .
    if (cond3) {
      . . .
    }
  }
}
return nullptr;can be refactored into: if (!cond1) return nullptr; . . . if (!cond2) return nullptr; . . . if (!cond3) return nullptr; . . . It is easier to follow the logic if the function is composed in this manner because from the very beginning you know that else with more stuff is not going to follow. | |
Code clean up
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
|---|---|---|
| 137–159 | Not really. | |
| 175–176 | What is IgnoreParensAndCasts`? I didn't find it in the source code anywhere (ripgrep, that is). | |
| 223 | No it works even in that case (I have added a test for that). It's got to do with how the AST data structures are (int *a = nullptr, *b = smart.get(); is considered a single decl). | |
| 225 | Not sure if that'll work neatly since I actually need the return value of the predicate function (the report). | |
| 234–241 | Do you still think that's the case now? (After breaking it into functions). | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
|---|---|---|
| 175–176 | Just a typo, the actual name is IgnoreParenCasts (Expr::IgnoreParenCasts) | |
| 238 | LLVM-code style mandates no curly braces around single-line ifs. | |
| 251 | (I think this was already pointed out, but early-exits are the way to go in LLVM. const auto *DS = llvm::dyn_cast<DeclStmt>(S)); if (!DS) return nullptr; const Decl *D = DS->getSingleDecl(); const auto *VD = llvm::dyn_cast<VarDecl>(D); if (!VD) return nullptr; .... `` | |
Right, @teemperor, I will do the refactors once we figure out how to utilize trackExpressionValue() here (because that will eliminate quite a bit of code from GetNoteVisitor, so no point in refactoring those bits).
| clang/lib/StaticAnalyzer/Checkers/SmartPtr.h | ||
|---|---|---|
| 37 ↗ | (On Diff #347260) | Sorry for picking on it again, but is it visiting "get notes"? Otherwise, it is just a placeholder name that doesn't tell the reader what this class is actually for. | 
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
| 79 | I think it's better to use REGISTER_SET_FACTORY_WITH_PROGRAMSTATE instead and keep factory as part of the state as well. | |
| 152 | Functions are actions, it is better to express actions with verbs. | |
| 158–160 | This type of predicates shouldn't be scattered throughout the code here and there.  It should be definitely unified and put into a function that is shared with the checker and other parts of the model. | |
| 175 | Just wondering if return {}; will be sufficient here. | |
| 183–186 | I think it's better to unite these two into if (!E || E->getCastKind()...) | |
| 189–191 | I guess it escaped during code refactoring: | |
| 207–210 | Similar note here: if (!BO || BO->getOpcode()...) | |
| 213–216 | And here | |
| 469–470 | I generally don't like repeating code in both branches of the if statement. const auto *ExistingSet = State->get<ExprsFromGet>(ThisRegion); auto BaseSet = ExistingSet ? *ExistingSet : StmtSetFactory.getEmptySet(); auto NewStmtSet = StmtSetFactory.add(BaseSet, CallExpr); State = State->set<ExprsFromGet>(ThisRegion, NewStmtSet); | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtr.h | ||
|---|---|---|
| 57 ↗ | (On Diff #348166) | IMO, these three visit functions are a bit confusing in their names. I guess, my expectation for a group of methods that have very similar names is that they have similar signatures. And here visitIfStmt doesn't return diagnostic piece as opposed to two other methods. | 
| 37 ↗ | (On Diff #347260) | This comment is marked as "Done", but there is no code change nor justification. | 
| 36 ↗ | (On Diff #346633) | It is again some sort of verb. I don't know how to read it except for "emit note on get"-visitor. What about something like GetNoteEmitter? I mean do we really need to put Visitor in the name? | 
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
| 79 | It is marked as "Done", but the code is same. | |
| 104 | My guess is that it should be just isStdSmartPtr | |
| 154 | I'm still here picking on names 😅 It doesn't emit "bug report" it emits "note". | |
| 181–185 | I have two major questions about this implementation: 
 | |
| 213 | It's better to use IgnoreParenCasts here as well. | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtr.h | ||
|---|---|---|
| 37 ↗ | (On Diff #347260) | Yes it is the line const SVal *InnerPtr = State->get<TrackedRegionMap>(ThisRegion); in the VisitNode function. This uses the TrackedRegionMap which is defined in SmartPtrModelling.cpp but the visitor must be added to the bug-report in SmartPtrChecker.cpp. | 
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
| 79 | My bad, I should have put in the TODO. | |
| 175 | Yup | |
| 469–470 | Right, thanks :) | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtr.h | ||
|---|---|---|
| 36 ↗ | (On Diff #346633) | I put in Visitor because all the visitors that I have come across end with *Visitor*. But then again, I am sure that if we look, we'll find counter-examples. So your call here ... | 
| clang/lib/StaticAnalyzer/Checkers/SmartPtr.h | ||
|---|---|---|
| 37 ↗ | (On Diff #347260) | I see. IMO it still means that the visitor belongs with the checker and was put into this header as a workaround. | 
| clang/lib/StaticAnalyzer/Checkers/SmartPtr.h | ||
|---|---|---|
| 36 ↗ | (On Diff #346633) | That's true, but it doesn't mean that we shouldn't care about the names and how they read. You cannot say that you visit "emit note on get", so the name doesn't help to understand what this class does. If you want to name it GetVisitor - no problem because it does visit gets. But if you want to state in the name that it emits notes on gets, the name should say Emitter, and that second name is more specific. | 
A brief summary of an offline discussion we recently had.
(1) Basically we figured out that it's still necessary to do something like I originally suggested:
This is necessary because the symbol produced by .get() is not immediately collapsed to a constant and it remains interesting as a symbol for the entire duration of the new visitor's lifetime, but there may be unrelated .get()s on the same smart pointer during said lifetime that don't deserve a note despite producing the same symbol.
(2) We also came up with a different approach to communicating with trackExpressionValue(). First of all, we probably don't need to mark all nodes/expressions on which trackExpressionValue() switches modes as interesting; we're only interested in the spot where tracking ends. This happens because the checker fully models .get() and therefore it's impossible for a generic solution like trackExpressionValue() to proceed with tracking as that would have required checker-specific machinery. We could reduce the scope of proposal (1) by only marking the last node as interesting but I have a better idea: let's add a callback to trackExpressionValue() that's invoked once tracking ends. In our case such callback would attach a checker-specific visitor to the smart pointer which solves our problem perfectly.
Such callback could be useful in a lot more cases though, because it provides us with an extremely generic benefit of knowing the origin of the value. We already demand such knowledge in a number of other machines that are currently hard-coupled to trackExpressionValue(): namely, i'm talking about inlined defensive check suppressions. Both of these suppressions basically say "if a null/zero value originates from a nested function call that was exited before the bug node, suppress the warning". These suppressions don't care where the value was passing through, they only care where it originated from. As such, by providing a callback for the origin of the value, we could decouple these suppressions and possibly even move them into the respective checkers (eg., the null dereference checker). I think this could be an excellent refactoring pass.
| clang/lib/StaticAnalyzer/Checkers/SmartPtr.h | ||
|---|---|---|
| 61–64 ↗ | (On Diff #348166) | Typo! | 
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
| 241 | That's not what the assert is saying; the assert is saying that the DeclStmt has exactly one Decl. It basically forbids code like int x = 1, y = 2; . You may wonder why don't you crash all over the place. That's because Clang CFG creates its own DeclStmts that aren't normally present in the AST, that always have exactly one declaration. This is necessary because there may be non-trivial control flow between these declarations (due to, say, presence of operator ?: in the initializer) so they have to be represented as different elements (possibly even in different blocks) in the CFG. | |
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
|---|---|---|
| 241 | So I guess the tests at lines 317 and 378 of smart-ptr-text-output.cpp work because of the CFG messing with the AST? So should I remove the assert? | |
Important question from @vsavchenko:
I have two major questions about this implementation:
- Why don't we need an actual check for IfStmt? Won't it trigger on bool unused = !pointer;? And if so it won't mean constrained.
- Why do we only care about implicit pointer-to-bool conversion? What about situations like pointer == nullptr, NULL != pointer, __builtin_expect(pointer, 0), etc?
I think there's no way around re-using/generalizing the logic from ConditionBRVisitor::VisitNode in some form. I guess you could try to separate the part where it looks at the current program point and finds out what's constrained. Then apply it to the moment of time where the interesting constraint appears (whereas ConditionBRVisitor continously scans all program points with the same hopefully-reusable logic).
| clang/lib/StaticAnalyzer/Checkers/SmartPtrModeling.cpp | ||
|---|---|---|
| 241 | 
 Yes. The rest of the static analyzer works for the same reason; a lot of code relies on it. 
 The assert is correct but the message is wrong / misleading. | |

I'm not sure if we should expect 16 unique places where uptr::get() called on a path. I would guess 4 or 2 is more than enough.