This is an archive of the discontinued LLVM Phabricator instance.

[Analyzer][WebKit] UncountedLocalVarsChecker
ClosedPublic

Authored by jkorous on Jul 6 2020, 2:35 PM.

Details

Reviewers
NoQ

Diff Detail

Event Timeline

jkorous created this revision.Jul 6 2020, 2:35 PM
jkorous updated this revision to Diff 283735.Aug 6 2020, 2:21 PM
jkorous retitled this revision from WIP [Analyzer][WebKit] UncountedLocalVarsChecker to [Analyzer][WebKit] UncountedLocalVarsChecker.
jkorous edited the summary of this revision. (Show Details)
jkorous added a reviewer: NoQ.
  • added docs
NoQ added inline comments.Aug 10 2020, 10:17 PM
clang/docs/analyzer/checkers.rst
2541–2543

Why exactly is this bad but strictly nested scopes are good? You can't lexically use uncounted after the destructor is invoked. Even if you make a separate ref-counted object in this scope and start counting uncounted with it, it'll still be dead *before* the destructor of counted because destructors are guaranteed to be invoked in the opposite order.

clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
52

*tries out the new fancy phabricator suggestions*

jkorous added inline comments.Aug 11 2020, 3:54 PM
clang/docs/analyzer/checkers.rst
2541–2543

I agree that this is not sound in its current shape. IIRC the idea for this rule came from WebKit folks and we decided to give it a try. I only got this far but since it's somewhat self-contained and somewhat coherent I was thinking landing it as alpha checker might be ok. What do you think?

The thinking was that ultimately we'd be able to make it pretty much sound. The rule about the scope was more about being conservative and making implementation of the checker easier.

For the full picture of the glorious hypothetical future please:

  1. Imagine that uncounted variables could be initialized only from ref-counted variables whose names start with guard_ and are outside of the declaration scope of the uncounted variable.
  2. Imagine that we'd prohibit any modifications (assignments, resets, destructions, etc) for ref-counted variables named guard_ in scopes embedded in their declaration scope.
  3. Imagine that we might even prohibit access to the raw pointer from guard_ prefixed ref-counted variables directly in the scope they are declared. To avoid zombie-fication between their definition and the embedded scope they'd guard.
  4. Consider that we might allow definitions of uninitialized uncounted local variables only under certain conditions.
  5. Consider that we might prohibit and check for delete-ing values obtained from ref-counted variables named guard_* (and potentially uncounted local variables).
clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
52

Thanks!

That's a sweet phab feature!

jkorous updated this revision to Diff 284939.Aug 11 2020, 5:31 PM
jkorous marked an inline comment as done.

fix typo

Friendly ping

NoQ added inline comments.Aug 21 2020, 5:03 PM
clang/docs/analyzer/checkers.rst
2541–2543

Ok! I mean, i still don't understand why same-level variables are bad, but i guess a nested scope is a pretty good indication that the user has thought deeply about ownership.

clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
34

It's possible to do this linearly, in top-down manner, right? I.e., instead of looking up parents in the parent map, remember them as a context on the stack while visiting stuff. I guess that's what i would have done if i had a normal non-recursive StmtVisitor from the start; in this case i'd simply skip the nested DeclStmts when visiting if-statements and such (possibly diving into their initializers directly, not sure if that's necessary but i guess those could contain GNU StmtExprs).

But i guess it's still more or less linear given that we're looking up at most two parents every time.

86–91

Mmm, ok, this one's a lot less linear than the other one. I still feel like you should consider doing a plain StmtVisitor (which should work just fine given that you're already analyzing only function bodies), and then maintain a stack of scopes as part of your state during traversal. It might end up being a lot more straightforward and possibly faster.

234

I think there should be more explanation behind the word "unsafe". I.e., "not backed by a ref-counted variable in a larger scope" or something like that.

jkorous added inline comments.Aug 26 2020, 10:50 PM
clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
34

I could probably use a recursive StmtVisitor, filter out everything I am not interested in (seems like a lot https://clang.llvm.org/doxygen/classclang_1_1CompoundStmt.html) and visit VarDecl child nodes.

But won't the complexity be actually worse?

O(count(Stmt)) "could be >" O(count(local VarDecl))
86–91

I see what you don't like and agree that I'd be happier to have a clearer and more performant approach but I don't understand your suggestion.

Let me start by explaining how I understand the task we have here and how I implemented it. Maybe you'll see some completely different approach than I did.

We get an arbitrary pair (Guarded, MaybeGuardian) of VarDecl-s where Guarded is newly defined local variable and MaybeGuardian is used to initialize it (potentially "indirectly").

FooClass guarded = maybeguardian.get();

Informally speaking we are trying to verify that MaybeGuardian VarDecl is defined in a block that is an ancestor (== direct or indirect parent, a block is NOT ancestor to itself) to the block where Guarded VarDecl is defined.

In terms of nodes we are trying to verify that MaybeGuardian VarDecl is part of CompoundStmt which also contains the CompoundStmt which contains Guarded VarDecl. The "CompoundStmt which contains the CompoundStmt" is important as we don't allow Guarded and MaybeGuardian to be declared in the same scope/as part of the same CompoundStmt.

The way I implemented is:

  1. Find the closest CompoundStmt ancestor of MaybeGuardian VarDecl.
  2. Iterate CompoundStmt ancestors of Guarded VarDecl (from Guarded to TranslationUnitDecl).
    1. Skip the first CompoundStmt ancestors of Guarded as we don't allow Guarded and MaybeGuardian to be defined in the same scope
    2. If any further CompoundStmt ancestors of Guarded == the closest CompoundStmt ancestor of MaybeGuardian VarDecl from step 1. we've verified that the requirement about blocks is satisfied => return true;.
  3. Otherwise return false;.

Now, I guess we definitely could trade some memory for CPU cycles but I don't see any trivial solution due to these:

  • It probably doesn't make sense to run the checker from any other node than uncounted local variable being initialized - this is the Guarded VarDecl in this context. I feel that if we rewrite the checker to start by visiting() any other node that plays a role in this pattern the performance might be even worse.
  • We shouldn't expect that the MaybeGuardian has to be in the direct parent CompoundStmt of Guarded.

I can imagine that we could keep a stack of CompoundStmt and manage it in dataTraverseStmtPre() and dataTraverseStmtPost() but don't see how to use it so it'd be a clear win in terms of readability or performance.

Also, another thought - this idea is really alpha for us too in the sense that we still haven't gotten very good feel for how invasive it is. That means that it's pretty likely that it'll change and I would not be surprised if we for example decide to drop the requirement for embedded block.
I am wondering whether it makes sense to invest into refactoring now. Do you think landing it as it is would still somehow fit under the alpha label or is it expected such checkers to live out of tree until they mature enough?

Sorry to bother you @NoQ - it'd be awesome if you could take a look!

Friendly ping

NoQ added a comment.Sep 18 2020, 1:00 AM

Sry! I'll try my best to keep up from now on.

clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
34

The code is already O(Stmt) because that's what RecursiveASTVisitor does anyway, but it's more like O(count(Stmt)²) in your worst case when for every statement you ascend through all of its parents. In this case it probably doesn't matter because there are just two parents. In the other case it probably matters, but only a little bit - of course such tall ASTs don't exist in practice.

86–91

(i seem to be replying in the other comment)

and manage it in dataTraverseStmtPre() and dataTraverseStmtPost()

In case of non-recursive visitor it's just

void VisitCompoundStmt(const CompoundStmt *S) {
  // Put stuff on stack.
  VisitChildren(S);
  // Remove stuff from stack.
}

So it's very idiomatic and intuitive. Furthermore, instead of a stack you can do something fancier like ImmutableSet of all active variables that'll allow you to not iterate at all over previous layers while still keeping you O(Stmt).

But then, again, it's not a strong opinion. The code is understandable as-is and i doubt performance benefits are actually going to be noticeable.

132

I don't understand what is this bugtype trying to tell me. I'm like, "Type parameter? Is it about template arguments?"

How about "Raw pointer or reference not backed by ref-counted variable"?

clang/test/Analysis/Checkers/WebKit/uncounted-local-vars.cpp
10

You mean like a rule, "always initialize your raw pointers"? I'm not sure there's much benefit over the existing checker that finds use of uninitialized variables path-sensitively... are we that desparate(?)

102

FileCheck isn't actually running in this test file.

NoQ added a comment.Sep 19 2020, 4:39 PM

(I.e., i don't have strong opinions about high-level design and will happily push the green button as soon as minor nits are addressed)

jkorous added inline comments.Sep 21 2020, 10:35 AM
clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
132

How about "Uncounted raw pointer or reference not provably backed by ref-counted variable"?

  • We don't warn about types not related to ref-counting - for example int* would be ignored.
  • Since as of now we have the rule about scopes there are trivial examples where we'd print the warning even for pointers very obviously backed by ref-counted variable.
void foo() {
   RefPtr<Foo> rc_foo;
   Foo * raw_foo = rc_foo.get(); // warning
}
clang/test/Analysis/Checkers/WebKit/uncounted-local-vars.cpp
10

I probably wrote much less than was on my mind which is roughly that the current implementation is a set of individual rules that prevent certain classes of bugs but definitely still have many "holes" if we try using it as a system which would guarantee absence of certain classes of bugs. In the future I'd like to keep individual rules simple but close the loopholes. One such glaring loophole in particular is that we don't check for assignments to uncounted local pointer variables. The uninitialized-ness here is just a signal that we don't check that particular variable at all (as of now).

102

Ahh, thanks! I originally used FileCheck for tests of the prototype.

jkorous updated this revision to Diff 293215.EditedSep 21 2020, 10:38 AM

Nits were addressed (hopefully).

NoQ accepted this revision.Sep 21 2020, 1:36 PM

Thank you, i think this is good to land!

clang/lib/StaticAnalyzer/Checkers/WebKit/UncountedLocalVarsChecker.cpp
132

That sounds pretty awesome, i like it.

This revision is now accepted and ready to land.Sep 21 2020, 1:36 PM

Thank you Artem!

jkorous closed this revision.Sep 22 2020, 12:03 PM

I made a typo in the commit message reference so this revision didn't get closed automatically.