This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Add better tracking for RetainCountChecker leak warnings
Needs ReviewPublic

Authored by vsavchenko on Jun 11 2021, 11:08 AM.

Details

Summary

RetainCountChecker models a group of calls that are interpreted as
"identity" functions. bugreporter::Tracker couldn't see through
such calls and pick the right argument for tracking.

This commit introduces a new tag that helps to keep this information
from the actual analysis and reuse it later for tracking purposes.

So, in short, when we see a call foo(x) that we model as
"identity", we now add a tag that is saying that the result of
foo(x) is essentially x. When the tracker, later on, tries to
track value from foo(x), we point out that it should track x.

Diff Detail

Event Timeline

vsavchenko created this revision.Jun 11 2021, 11:08 AM
vsavchenko requested review of this revision.Jun 11 2021, 11:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 11 2021, 11:08 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
NoQ added inline comments.Jun 11 2021, 11:55 AM
clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
991–995

I guess this part should ultimately be written in one place, eg. ExplodedNode::findTag() or something like that.

I'd also really like to explore the possibility to further limit the variety of nodes traversed here. What nodes are typically traversed here? Is it checker-tagged nodes or is it purge dead symbol nodes or something else?

Rebase

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
991–995

Yes, ExplodedNode::findTag() sounds like a great idea!

I mean it is hard to tell without calculating statistics right here and running it on a bunch of projects. However, it is always possible to write the code that will have it the other way. My take on it is that it is probably a mix of things.

I'd also prefer to traverse less, do you have any specific ideas here?

Szelethus added a comment.EditedJun 17 2021, 12:16 AM

Aha, alright! So the tracker tracked back to where the tracked expression got computed in node N, which is the return value of some identity function. Unless its explicitly told that there is a link in between the return value and the parameter, the tracker can't figure this out that it could track further (why would it, right?). So it asks its handlers whether they want to do anything with N, which IdentityHandler does -- it digs out the node where the parameter was evaluated, and tells the tracker to go on with that. Awesome!

I guess we could put IdentityHandler in an anonymous namespace?

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountChecker.cpp
945–947

I don't know ObjC at all, but is this identity obvious? Do you not need a NoteTag to say that "Foo is an identity function, its return value equals the parameter"? Or, in the possession of the actual PathSensitiveBugReport that you can retrieve in the handler, you could drop a note there, maybe?

Aha, alright! So the tracker tracked back to where the tracked expression got computed in node N, which is the return value of some identity function. Unless its explicitly told that there is a link in between the return value and the parameter, the tracker can't figure this out that it could track further (why would it, right?). So it asks its handlers whether they want to do anything with N, which IdentityHandler does -- it digs out the node where the parameter was evaluated, and tells the tracker to go on with that. Awesome!

That's exactly it!

I guess we could put IdentityHandler in an anonymous namespace?

Sure thing!

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountChecker.cpp
945–947

These are usually some type of casts. And we already provide notes when this value gets into some other region (like 'newPtr' initialized to the value of 'originalPtr'). I think that they are enough. And if we want something better, I think we need to tweak that messaging (StoreHandler now allows that) instead of crowding user with even more notes.

Move IdentityHandler into the anonymous namespace

Szelethus added inline comments.Jun 17 2021, 3:02 AM
clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountChecker.cpp
945–947

Fair enough!

clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
991–995

I'd also really like to explore the possibility to further limit the variety of nodes traversed here. What nodes are typically traversed here? Is it checker-tagged nodes or is it purge dead symbol nodes or something else?

Is there something I'm not seeing here? Trackers basically ascend a path from the error node to at most the root of the ExplodedGraph (not the trimmed version, as Tracker::track() is called before any of that process happens), so its not much slower than trackExpressionValue, right?

Or does this, and likely many other future handlers run such a loop more often then what I imagine?

vsavchenko added inline comments.Jun 17 2021, 6:15 AM
clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
991–995

Essentially, yes. trackExpressionValue (and Tracker::track) ascend the graph searching for a node where the give expression is evaluated. The problem here is that whenever we ask to track some expression, not only trackExpressionValue will do its traversal (much longer), but also this handler will do its (shorter). It would be better not to do this altogether, but I personally don't see any solutions how we can cut on this part.
At the same time, I don't think that it is performance critical, considering that the part of the graph where N->getStmtForDiagnostics() == E is small.

Szelethus added inline comments.Jun 22 2021, 6:49 AM
clang/lib/StaticAnalyzer/Checkers/RetainCountChecker/RetainCountDiagnostics.cpp
991–995

Ah, so, this is your patch:

E: Error Node
C: Identity function call
V: Argument evaluation
A: Origin of the argument value
---->: Nodes traversed by the tracker
====>: Nodes traversed by the handler

*              *           *           *
E->O->O->O->O->C->O->O->O->V->O->O->O->A->X->X->X

--------------->          
               ============>----------->

This is fine, because its just a single ascend, made a little worse by the fact that we do it for each tracked variable. But later, with more handlers, it might get worse (per variable):

*              *           *           *
E->O->O->O->O->C->O->O->O->V->O->O->O->A->X->X->X

--------------->           
               ============>----------->
               ===============>----->
               ==============>------------------>
               =================================>
               ======================>

Now, I can't immediately imagine a case where someone would add THAT many handlers, or track that many variables, even in extreme circumstances, but I don't posses a crystal ball either :^)