Details
- Reviewers
NoQ vsavchenko
Diff Detail
Event Timeline
Hi Artem,
I rewrote the first two WebKit checkers: https://reviews.llvm.org/D77177 & https://reviews.llvm.org/D77178
It took me way more time than I expected and at this point I don't think that using matchers here is a net benefit.
Unless you tell me I'm holding it wrong or you and/or other analyzer folks feel strongly about this I'd rather use visitors :-(
The specific reasons why I'd prefer visitors are:
- Debug-ability. Especially false negatives (i. e. things that I'd like the matcher to match but it doesn't) are just so much harder to debug with matchers. I usually ended up just bisecting the matcher and iteratively recompiling and rerunning tests. I assume this largely predicts how maintenance of the checkers will be.
- (I might be just ignorant about this.) Less expressive power with currently implemented matchers. I have couple issues I haven't been able to solve:
- How to write a matcher that is filtering on parent AST nodes and matching children nodes without having to traverse the AST in children-to-parents direction. Example: for any function that has parameters of type pointer to ref-countable type that is called with "unsafe" arguments I'd like to match ALL the arguments so I can produce warnings.
IIUC the matcher needs to start from arguments (Expr), traverse up to the CallExpr and potentially down again to find callee and parameter types. While the traversal matcher for the parent-to-children direction are mostly implemented (e. g. forEachArgumentWithParam), I don't see any that would have semantics specific enough so I could use it here (hasParent won't help much).
Also - running the matcher on every Expr sounds more expensive then just running a visitor callback on CallExpr-s. - Some of the checks are basically matching "an arbitrary sequence of AST nodes from a given set followed by a specific kind of node". Again, maybe I just missed this but I don't know how to express this with matchers.
- Implementation of some of the checkers is a simple DAG rather than a tree in the sense that for certain branches the logic is identical. IIUC checkers are always trees. I could probably just use variables and place them to relevant nodes in the tree to avoid duplicating the logic but it still doesn't feel very natural. I would prefer to not mix matchers and visitors within WebKit checkers - not only because I would like to avoid having duplicate implementations of semantic model of WebKit smart pointers.
- How to write a matcher that is filtering on parent AST nodes and matching children nodes without having to traverse the AST in children-to-parents direction. Example: for any function that has parameters of type pointer to ref-countable type that is called with "unsafe" arguments I'd like to match ALL the arguments so I can produce warnings.
- While I personally like declarative style I am not fully convinced that matchers are more readable than just a simple visitor callback implementation esp. within clang codebase context.
- The macros and variadic templates in matchers implementation lead to pretty terrible compiler error messages.
All in all the only clear benefits of replacing visitors with matchers I see are:
- having less code (although with the matcher callback boilerplate not that much less)
- side-effect of new matchers being available for other people
IMO that's less than the costs.
+Valeriy, our new guy who already knows a lot about matchers :)
Hmm, can we have code side-by-side, i.e. a patch that changes from a visitor-based implementation to a matcher-based implementation? It'll be much easier to reason about readability changes this way.
Debug-ability. Especially false negatives (i. e. things that I'd like the matcher to match but it doesn't) are just so much harder to debug with matchers. I usually ended up just bisecting the matcher and iteratively recompiling and rerunning tests. I assume this largely predicts how maintenance of the checkers will be.
(...)
The macros and variadic templates in matchers implementation lead to pretty terrible compiler error messages.
This was my early experience as well but this quickly gets better as you gather more intuition.
I could probably just use variables and place them to relevant nodes in the tree to avoid duplicating the logic but it still doesn't feel very natural.
This is the natural, intended way of doing things. Put common sub-matchers into variables and re-use them when they otherwise get duplicated. Putting matcher into a variable also has a perk of being able to give the matcher a descriptive name, making it easier to read the code that uses it. I also don't think there's any unreasonable performance cost associated with it.
There's a separate story where you want the matched sub-trees to actually be the same node, as opposed to merely matching the same matcher; you can achieve it with equalsBoundNode().
How to write a matcher that is filtering on parent AST nodes and matching children nodes without having to traverse the AST in children-to-parents direction. Example: for any function that has parameters of type pointer to ref-countable type that is called with "unsafe" arguments I'd like to match ALL the arguments so I can produce warnings.
Hmm, i don't understand what the problem is, i.e. what prevents you from starting with callExpr(...) and putting all the checks inside it in a top-down manner?
Some of the checks are basically matching "an arbitrary sequence of AST nodes from a given set followed by a specific kind of node". Again, maybe I just missed this but I don't know how to express this with matchers.
You mean something like IgnoreImpCasts() but ignore something else instead? There's probably no out-of-the-box solution indeed, might need a custom matcher a-la ignoreImpCasts().
While I personally like declarative style I am not fully convinced that matchers are more readable than just a simple visitor callback implementation esp. within clang codebase context.
Another good thing about matchers is that because matchers are less omnipotent, it's immediately obvious by looking at the code whether it does something normal or does something weird. Most of the time it's immediately clear how the tree is traversed. Having arbitrary C++ in any visitor callback gives me a lot of reader anxiety :)
clang/lib/StaticAnalyzer/Checkers/WebKit/NoUncountedMembersChecker.cpp | ||
---|---|---|
46 | A lot of these allOf() can be omitted because they're implied when you stuff multiple matchers into one matcher. |
Hi Jan,
Debug-ability. Especially false negatives (i. e. things that I'd like the matcher to match but it doesn't) are just so much harder to debug with matchers. I usually ended up just bisecting the matcher and iteratively recompiling and rerunning tests. I assume this largely predicts how maintenance of the checkers will be.
If you are not debugging your own new matchers (or you know that the problem is not there), there is a good utility tool called clang-query. I think you might save plenty of time recompiling your code by simply trying matchers there first.
I'm with Artem (@NoQ) on writing ASTMatchers instead of Visitors.
clang/include/clang/ASTMatchers/ASTMatchers.h | ||
---|---|---|
2846 | There is some sort of naming convention for matchers like this: hasAny*, so I suggest hasAnyBase here. | |
2849 | I don't really see why that should be done in the Finder. // Forward declarations don't have bases and we'll crash if we ask for those. if (!Node.isThisDeclarationADefinition()) return false; return matchesFirstInRange(InnerMatcher, Node.bases_begin(), Node.bases_end(), Finder, Builder); NOTE: when dealing with lists of things we should be particularly careful with the Builder and bindings from the inner matcher. So it's good to use matchesFirstInRange that takes care of this problem for you. | |
2855 | Here you already _declared_ Node as CXXBaseSpecifier. There is no need to cast (and especially dyn_cast) in the matcher's body. The same applies to other matchers. | |
2856 | Because this matcher already accepts only CXXBaseSpecifiers, it is a bit of a naming overkill to go with isPublicBase instead of isPublic. inline AccessSpecifier getAccessSpecifier(const Decl &D) { return D.getAccess(); } inline AccessSpecifier getAccessSpecifier(const CXXBaseSpecifier &Base) { return Base.getAccessSpecifier(); } /// Matches public C++ declarations. /// /// Given /// \code /// class C { /// public: int a; /// protected: int b; /// private: int c; /// }; /// \endcode /// fieldDecl(isPublic()) /// matches 'int a;' /// /// TODO: document the base specifier part AST_POLYMORPHIC_MATCHER(isPublic, AST_POLYMORPHIC_SUPPORTED_TYPES(Decl, CXXBaseSpecifier)) { return getAccessSpecifier(Node) == AS_public; } Artem (@NoQ ), what do you think about this merge? | |
2894 | This matcher seems unnecessary :( It does casts (it is similar to having casts in a Visitor), and it has repetition. Probably you can implement this using other matchers. I'll look into it and try to think of a better solution. | |
clang/include/clang/StaticAnalyzer/Checkers/Checkers.td | ||
1523 | Looks like this checker got lost in the process :( | |
clang/lib/StaticAnalyzer/Checkers/WebKit/NoUncountedMembersChecker.cpp | ||
43 | CXXRecordDecl is either class, struct or union. So this part can be easily reimplemented as unless(isUnion()) | |
46 | +1 | |
63 | I would also extract a part that checks that class has one of those special methods. So it looks something like this: DeclarationMatcher ClassWithRefOrDerefMethodsMatcher = cxxRecordDecl( unless(isUnion()), hasMethod(cxxMethodDecl(anyOf(hasName("ref"), hasName("deref")), isPublic()))); DeclarationMatcher RefCountableClassMatcher = cxxRecordDecl(anyOf( ClassWithRefOrDerefMethodsMatcher, hasABaseSpec(cxxBaseSpecifier( isPublicBase(), hasType(ClassWithRefOrDerefMethodsMatcher))))); |
clang/include/clang/ASTMatchers/ASTMatchers.h | ||
---|---|---|
2894 | I think it's also important to note that this matcher seems not to be generic enough to be included in the main matchers library. TypeMatcher typeOfRecord(DeclarationMatcher InnerMatcher) { TypeMatcher RecordTypeMatcher = recordType(hasDeclaration(InnerMatcher)); return anyOf(RecordTypeMatcher, substTemplateTypeParmType( hasReplacementType(RecordTypeMatcher))); } It works just as good, takes less place and resolves the issue with duplication. |
Please compare
shouldSkipDecl () and visitCXXRecordDecl() in https://reviews.llvm.org/D77177#change-V7KFc7wNp5n3
against
checkASTDecl in https://reviews.llvm.org/D78165#change-V7KFc7wNp5n3 plus implementation in ASTMatchers.h https://reviews.llvm.org/D78165#change-zIDgIjrTgBsu
There's a separate story where you want the matched sub-trees to actually be the same node, as opposed to merely matching the same matcher; you can achieve it with equalsBoundNode().
Hopefully I'll never need this but good to know it exists!
How to write a matcher that is filtering on parent AST nodes and matching children nodes without having to traverse the AST in children-to-parents direction. Example: for any function that has parameters of type pointer to ref-countable type that is called with "unsafe" arguments I'd like to match ALL the arguments so I can produce warnings.
Hmm, i don't understand what the problem is, i.e. what prevents you from starting with callExpr(...) and putting all the checks inside it in a top-down manner?
Nvm - I figured it out. With forEach/ forEachArgumentWithParam I get a match for each of the matching child nodes.
Some of the checks are basically matching "an arbitrary sequence of AST nodes from a given set followed by a specific kind of node". Again, maybe I just missed this but I don't know how to express this with matchers.
You mean something like IgnoreImpCasts() but ignore something else instead? There's probably no out-of-the-box solution indeed, might need a custom matcher a-la ignoreImpCasts().
Pretty much. I definitely can take the current implementation and hide it behind matcher interface but I don't think it helps anything and just makes the debuggability/maintenance worse.
All in all, I am still missing a convincing argument for why to use matchers. Given that the conversion from visitors would take nontrivial extra time which I should be able to justify spending on this and since based on this experiment I find the debugging/maintenance of matchers requires more effort I'd go with visitors.
Now, if you told me that there's say a long-term effort of migration all the checkers to visitors I'd accept that but that's not the case, right?
Hi Valeriy!
Debug-ability. Especially false negatives (i. e. things that I'd like the matcher to match but it doesn't) are just so much harder to debug with matchers. I usually ended up just bisecting the matcher and iteratively recompiling and rerunning tests. I assume this largely predicts how maintenance of the checkers will be.
If you are not debugging your own new matchers (or you know that the problem is not there), there is a good utility tool called clang-query. I think you might save plenty of time recompiling your code by simply trying matchers there first.
Thank you for the advice but unfortunately I was mostly debugging my matchers :(
I'm with Artem (@NoQ) on writing ASTMatchers instead of Visitors.
clang/include/clang/ASTMatchers/ASTMatchers.h | ||
---|---|---|
2849 | Would this actually go through bases transitively? |
clang/include/clang/ASTMatchers/ASTMatchers.h | ||
---|---|---|
2849 | You are right, it won't. In this case, I guess, it would be good to add a couple of tests covering transitive traversal of base classes. I think it should be reflected in the name of the matcher (and if we have a transitive version, we should have a regular version as well). Additionally, I still don't see why that should be a part of MatchFinder. |
That's definitely not the case. We're definitely not forcing anybody to use ASTMatchers (especially given that they're not omnipotent), especially when it means rewriting large amounts of code. Even regardless of whether matchers are good or bad, if you don't feel like rewriting your checkers with matchers, I still want them in. I hoped your experience with matchers to be immediately inspiring but if it's not the case then let's proceed with your original code instead.
That said, i love the converted code. It's not only more compact but much easier to understand, which excites me as a believer in "code is read more often than it's written". The idea behind the code is immediately obvious and all the gory details are isolated and hidden. I still feel like encouraging the use of matchers for code that's, well, not written yet.
Ok, I'll do that. Thanks a bunch for this review - I appreciate it!
That said, i love the converted code. It's not only more compact but much easier to understand, which excites me as a believer in "code is read more often than it's written". The idea behind the code is immediately obvious and all the gory details are isolated and hidden. I still feel like encouraging the use of matchers for code that's, well, not written yet.
This is actually pretty funny as yet again I totally agree with someone on the high level principles but then we get to different conclusions :)
I'm very glad I had this opportunity to try out the matchers! And since I was hearing almost universal praise of the matchers which sadly didn't match my own experience (pun intended) I've spent a fair amount of time thinking about why is that.
Part of it is just contextual - that for matchers I'd have to implement functionality for which clang AST already has existing counterparts so it's not really a fair comparison. But I also feel that matchers kind of sweep the complexity under the carpet and present superficial idea of simplicity. This is great for cursory reading of the code but adds extra complexity when someone needs to understand the code on a debugging level. In my experience a structurally simple code with respect to limits of commonly used tools (debugger, grep, ...) is easier to maintain than a code that has a simple textual representation.
Since I've already done the work I'll polish the couple generic matchers from this patch and will land them separately.
There is some sort of naming convention for matchers like this: hasAny*, so I suggest hasAnyBase here.