This is an archive of the discontinued LLVM Phabricator instance.

[ast-matchers] Add hasSubstatement() traversal matcher for caseStmt(), defaultStmt(), labelStmt()
AbandonedPublic

Authored by LegalizeAdulthood on Dec 27 2021, 11:21 PM.

Details

Summary

Previously if you wanted to match the statement associated with
a case, default, or labelled statement, you had to use hasDescendant.
Review comments stated that hasDescendant is heavy-handed and that a
more specific traversal matcher was preferred. This is that
traversal matcher.

  • Separate matcher definitions with a blank line.
  • Add corresponding unit tests for new traversal matcher.
  • Add the new matcher to the Registry
  • Add documentation in the AST matchers reference HTML page.
  • Remove trailing semi-colon from REGISTER_MATCHER macro to be consistent with the other macro declarations and fix a missing semi-colon in the invocation of the macro.

Diff Detail

Unit TestsFailed

Event Timeline

LegalizeAdulthood requested review of this revision.Dec 27 2021, 11:21 PM
LegalizeAdulthood created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptDec 27 2021, 11:21 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
LegalizeAdulthood edited the summary of this revision. (Show Details)Dec 27 2021, 11:23 PM
LegalizeAdulthood edited the summary of this revision. (Show Details)

I think while implementing this I got a little confused as to whether or not it was a narrowing or traversal matcher. It feels like a traversal matcher to me.

If someone would confirm my reasoning, I'd appreciate it. That means I should probably move the unit tests to the traversal tests file.

Update based on comments from lint check

LegalizeAdulthood retitled this revision from [ast-matchers] Add hasSubstmt() traversal matcher for caseStmt(), defaultStmt() to [ast-matchers] Add hasSubstmt() traversal matcher for caseStmt(), defaultStmt(), labelStmt().
LegalizeAdulthood edited the summary of this revision. (Show Details)

Update to include labelStmt()
Move the unit tests to ASTMatchersTraversalTest.cpp

Correct HTML copy/paste error for labelStmt() traversal

clang/lib/ASTMatchers/Dynamic/Registry.cpp
358

I see there is another matcher called hasAnySubstatement where "substatement" is spelled out; I think I will rename this matcher hasSubstatement to be consistent with hasAnySubstatement.

LegalizeAdulthood retitled this revision from [ast-matchers] Add hasSubstmt() traversal matcher for caseStmt(), defaultStmt(), labelStmt() to [ast-matchers] Add hasSubstatement() traversal matcher for caseStmt(), defaultStmt(), labelStmt().

Rename hasSubstmt to hasSubstatement

Document the new matcher in the clang release notes

Previously if you wanted to match the statement associated with a case, default, or labelled statement, you had to use hasDescendant.

This isn't true. You can use has() to do direct substatemematching, and I'm wondering whether we need this new matcher at all. e.g., https://godbolt.org/z/K5sYP757r

clang/include/clang/ASTMatchers/ASTMatchers.h
2432

Spurious newline?

7720

May need to reformat as well, but the intent is to make it clear that this does not behave like hasAnySubstatement().

7727

This is invalid code (there's nothing to break out of), so you may want to tweak the example a bit.

7730–7734
7737–7738

Pretty sure you can use the base class here as a simplification.

clang/lib/ASTMatchers/Dynamic/Registry.cpp
69

Good catch; feel free to land this and related changes as an NFC commit.

clang/unittests/ASTMatchers/ASTMatchersTraversalTest.cpp
161

Should fix the formatting.

LegalizeAdulthood marked an inline comment as done.EditedJan 5 2022, 12:22 PM

Previously if you wanted to match the statement associated with a case, default, or labelled statement, you had to use hasDescendant.

This isn't true. You can use has() to do direct substatemematching, and I'm wondering whether we need this new matcher at all. e.g., https://godbolt.org/z/K5sYP757r

Well, the whole point of this was that previous review comments were complaining that I was using an "expensive" matcher
instead of one that got right to the point. (What's the difference between has and hasDescendent anyway?)

I looked at the implementation of "has" and it has a huge amount of machinery underneath it. I drilled in as follows (my ToT hash is 773ab3c6655f4d2beec25bb3516b4d4fe2eea990):
ASTMatchers.h:3391 declares has as an instance of internal::ArgumentAdaptingMatcherFunc<internal::HasMatcher>
ArgumentAdaptingMatcherFun appears just to be a factory mechanism for creating the appropriate matcher.
HasMatcher seems to do the actual matching
HasMatcher<T, ChildT>::matches delegates to ASTMatchFinder::matchesChildOf(const T &Node, ...)
ASTMatchFinder::matchesChildOf(const T &Node, ...) delegates to another overload of matchesChildOf(const DynTypedNode &Node, ...)` after asserting a static type relationship (ASTMatchersInternal.h:762)
ASTMatchFinder::matchesChildOf(const DynTypedNode &Node, ...) is a virtual function with one implementation in MatchASTVisitor (ASTMatchFinder.cpp:659)
MatchASTVisitor::matchesChildOf delegates to memoizedMatchesRecursively (ASTMatchFinder.cpp:664)
For non-memoizable nodes, MatchASTVisitor::memoizedMatchesRecursively delegates to matchesRecursively (ASTMatchFinder.cpp:599)
MatchASTVIsitor::matchesRecursively instantiates a ASTNodeNotSpelledInSourceScope and a MatchChildASTVisitor upon which it calls findMatch (ASTMatchFinder.cpp:645)
MatchChildASTVisitor::findMatch does a series of DynNode.get<T> checks and delegates to the appropriate overload of MatchChildASTVisitor::traverse, in our case it would be the one for Stmt
MatchChildASTVisitor::traverse<T>(const T &Node) delegates to MatchChildASTVisitor::match<T>(const T &Node)
MatchChildASTVisitor::match<T>(const T &Node) instantiates a BoundNodesTreeBuilder and calls match on the held Matcher.

Honestly, at this point, I become lost in manually tracing the code through "go to definition" in my IDE. So I switched my unit test for HasCaseSubstmt to use has and drilled in through the debugger.
What I saw echoes the above, although I took the wrong branch in my manual analysis in memoizedMatchesRecursively, it goes down the memoization path.

However, the whole point of memoizing a result is because that result was expensive to compute and the memoization saves time on subsequent queries
because the result has been memorized. In this case, it's a simple getter on CaseStmt to obtain the node against which you want to match, so I don't
see the memoization as having any particular benefit. (The inner matcher to hasSubstatement may be expensive and could be memoized.) For results
obtain by the visitor pattern, I can see why you'd want to memoize them as there is lots of code being executed to implement the Visitor pattern.

clang/include/clang/ASTMatchers/ASTMatchers.h
2432

I didn't add this intentionally and I can remove it, but the general style in this file is to separate top-level decls by a blank line.

LegalizeAdulthood marked an inline comment as done.Jan 5 2022, 12:24 PM
aaron.ballman added a subscriber: sammccall.

Previously if you wanted to match the statement associated with a case, default, or labelled statement, you had to use hasDescendant.

This isn't true. You can use has() to do direct substatemematching, and I'm wondering whether we need this new matcher at all. e.g., https://godbolt.org/z/K5sYP757r

Well, the whole point of this was that previous review comments were complaining that I was using an "expensive" matcher
instead of one that got right to the point. (What's the difference between has and hasDescendent anyway?)

has() matches the direct descendant AST node, whereas hasDescendant() matches *any* descendant AST node. So hasDescendant() actually is expensive -- it can traverse the AST more times than you'd expect. I don't know of any performance concerns with has() though.

e.g., https://godbolt.org/z/n3adEz4qW (the first query matches nothing, the second query has a match)

I looked at the implementation of "has" and it has a huge amount of machinery underneath it. I drilled in as follows (my ToT hash is 773ab3c6655f4d2beec25bb3516b4d4fe2eea990):
ASTMatchers.h:3391 declares has as an instance of internal::ArgumentAdaptingMatcherFunc<internal::HasMatcher>
ArgumentAdaptingMatcherFun appears just to be a factory mechanism for creating the appropriate matcher.
HasMatcher seems to do the actual matching
HasMatcher<T, ChildT>::matches delegates to ASTMatchFinder::matchesChildOf(const T &Node, ...)
ASTMatchFinder::matchesChildOf(const T &Node, ...) delegates to another overload of matchesChildOf(const DynTypedNode &Node, ...)` after asserting a static type relationship (ASTMatchersInternal.h:762)
ASTMatchFinder::matchesChildOf(const DynTypedNode &Node, ...) is a virtual function with one implementation in MatchASTVisitor (ASTMatchFinder.cpp:659)
MatchASTVisitor::matchesChildOf delegates to memoizedMatchesRecursively (ASTMatchFinder.cpp:664)
For non-memoizable nodes, MatchASTVisitor::memoizedMatchesRecursively delegates to matchesRecursively (ASTMatchFinder.cpp:599)
MatchASTVIsitor::matchesRecursively instantiates a ASTNodeNotSpelledInSourceScope and a MatchChildASTVisitor upon which it calls findMatch (ASTMatchFinder.cpp:645)
MatchChildASTVisitor::findMatch does a series of DynNode.get<T> checks and delegates to the appropriate overload of MatchChildASTVisitor::traverse, in our case it would be the one for Stmt
MatchChildASTVisitor::traverse<T>(const T &Node) delegates to MatchChildASTVisitor::match<T>(const T &Node)
MatchChildASTVisitor::match<T>(const T &Node) instantiates a BoundNodesTreeBuilder and calls match on the held Matcher.

Honestly, at this point, I become lost in manually tracing the code through "go to definition" in my IDE.

I'm impressed you got this far -- the AST matching machinery under the hood is really quite complex! :-)

So I switched my unit test for HasCaseSubstmt to use has and drilled in through the debugger.
What I saw echoes the above, although I took the wrong branch in my manual analysis in memoizedMatchesRecursively, it goes down the memoization path.

Phew, that's good -- I would expect has to use the memoization pass.

However, the whole point of memoizing a result is because that result was expensive to compute and the memoization saves time on subsequent queries
because the result has been memorized. In this case, it's a simple getter on CaseStmt to obtain the node against which you want to match, so I don't
see the memoization as having any particular benefit. (The inner matcher to hasSubstatement may be expensive and could be memoized.) For results
obtain by the visitor pattern, I can see why you'd want to memoize them as there is lots of code being executed to implement the Visitor pattern.

My understanding (and my recollection here may be wrong) is that has() can memoize the results because the sub-matcher will always be matched against the immediate direct descendant AST node, but you can't get the same behavior from hasDescendant() or hasAncestor() because any of the descendant (or ancestor) nodes can be matched, so you have to traverse them instead of memoizing. I'm adding @sammccall in case he has information about the performance characteristics or thoughts about the proposed matcher in general.

has() matches the direct descendant AST node, whereas hasDescendant() matches *any* descendant AST node.
[...] I don't know of any performance concerns with has() though.

Thanks for the explanation!

Honestly, at this point, I become lost in manually tracing the code through "go to definition" in my IDE.

I'm impressed you got this far -- the AST matching machinery under the hood is really quite complex! :-)

Having CMake generate a VS project and drilling in with ReSharper for C++ is
pretty powerful :), but you have to manually keep track of how the template
and function arguments are being resolved in the dynamic execution. So it's
pretty easy to get lost in template oriented code, which is understandably
pretty common in AST land.

My understanding (and my recollection here may be wrong) is that has() can memoize the results
[...] but you can't get the same behavior from hasDescendant() or hasAncestor()

OK, that makes sense.

I'm adding @sammccall in case he has information about the performance characteristics or thoughts about
the proposed matcher in general.

So let me summarize what we've learned so far:

  • has only descends one level, is memoizable, and is less expensive than hasDescendant
  • has uses the Visitor pattern to compute the result that is memoized
  • @sammccall might be aware of any performance related issues with has

My takeaway:

  • if has isn't expensive, I can either ditch this public matcher or move it to be a private matcher used in my check

Ping.

This review has been waiting for a week without any comment.

My takeaway:

  • if has isn't expensive, I can either ditch this public matcher or move it to be a private matcher used in my check

I don't think has() is particularly expensive, so I think you should be able to use it directly. However, if you profiled something and notice a measurable difference between has() and hasSubstatement() in practice, that would be really good for the community to know.

My takeaway:

  • if has isn't expensive, I can either ditch this public matcher or move it to be a private matcher used in my check

[...] if you profiled something and notice a measurable difference
between has() and hasSubstatement() in practice, that would
be really good for the community to know.

Do we have some sort of benchmarking facility in place, or do I
have to homebrew something?

My takeaway:

  • if has isn't expensive, I can either ditch this public matcher or move it to be a private matcher used in my check

[...] if you profiled something and notice a measurable difference
between has() and hasSubstatement() in practice, that would
be really good for the community to know.

Do we have some sort of benchmarking facility in place, or do I
have to homebrew something?

I've always homebrewed it, but if we have better facilities these days, I'd love to know about them!

Abandoning this in favor of a private matcher where it is needed.