This is an archive of the discontinued LLVM Phabricator instance.

[ASTMatcher] Add coroutineBodyStmt matcher
ClosedPublic

Authored by ccotter on Dec 30 2022, 8:14 PM.

Details

Summary

The coroutineBodyStmt matcher matches CoroutineBodyStmt AST nodes.

Diff Detail

Event Timeline

ccotter created this revision.Dec 30 2022, 8:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 30 2022, 8:14 PM
Herald added a subscriber: ChuanqiXu. · View Herald Transcript
ccotter requested review of this revision.Dec 30 2022, 8:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 30 2022, 8:14 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ccotter retitled this revision from Add coroutineBodyStmt matcher to [ASTMatcher] Add coroutineBodyStmt matcher.Dec 30 2022, 8:14 PM
ccotter updated this revision to Diff 485748.Dec 30 2022, 10:00 PM
  • Add hasBody matcher support for coroutineBodyStmt

Thank you for this! FWIW, it looks like precommit CI found some issues that need to be resolved.

clang/include/clang/ASTMatchers/ASTMatchers.h
5498

I'm not certain it makes sense to me to add CoroutineBodyStmt to hasBody -- in this case, it doesn't *have* a body, it *is* the body.

clang/lib/ASTMatchers/Dynamic/Registry.cpp
171 ↗(On Diff #485748)

Please keep this sorted alphabetically.

ccotter added inline comments.Jan 9 2023, 8:37 PM
clang/include/clang/ASTMatchers/ASTMatchers.h
5498

With respect to hasBody(), my intent was to treat the CoroutineBodyStmt node as analogous to a FunctionDecl or WhileStmt. WhileStmts have information like the loop condition expression, as CoroutineBodyStmts contain the synthesized expressions such as the initial suspend Stmt. Both WhileStmts and CoroutineBodyStmts then have the getBody() methods, usually a CompoundStmt for WhileStmts and either a CompoundStmt or TryStmt for CoroutineBodyStmts.

Ultimately, I wanted to be able to match the CoroutineBodyStmt's function-body (using the grammar) from the standard, e.g., coroutineBodyStmt(hasBody(compoundStmt().bind(...))). If there is a different approach you'd recommend that's in line with the AST matcher design strategy, I can use that instead.

ccotter updated this revision to Diff 487661.Jan 9 2023, 8:39 PM
ccotter marked an inline comment as not done.
  • Add hasBody matcher support for coroutineBodyStmt
  • update doc
  • Sort list
  • clang-format
ccotter marked an inline comment as done.Jan 9 2023, 8:42 PM
ccotter updated this revision to Diff 487789.Jan 10 2023, 6:48 AM
  • Fix ParserTest.Errors
aaron.ballman added inline comments.
clang/include/clang/ASTMatchers/ASTMatchers.h
5498

What concerns me about the approach you have is that it doesn't model the AST. A coroutine body statement is a special kind of function body, so it does not itself *have* a body. So this is a bit like adding CompoundStmt to the hasBody matcher in that regard.

To me, the correct way to surface this would be to write the matcher: functionDecl(hasBody(coroutineBodyStmt())) to get to the coroutine body, and if you want to bind to the compound statement, I'd imagine functionDecl(hasBody(coroutineBodyStmt(has(compoundStmt().bind(...))))) would be the correct approach. My thinking there is that we'd use the has() traversal matcher to go from the coroutine body to any of the extra information it tracks (the compound statement, the promise, allocate/deallocate, etc).

Pinging @klimek and @sammccall for additional opinions.

ccotter added inline comments.Jan 10 2023, 7:49 AM
clang/include/clang/ASTMatchers/ASTMatchers.h
5498

I think I see. With my proposal, the match would be functionDecl(hasBody(coroutineBodyStmt(hasBody(stmt().bind(...))))). For completeness, your suggestion would need functionDecl(hasBody(coroutineBodyStmt(has(stmt(anyOf(cxxTryStmt(), compoundStmt()).bind(...)))))).

I am a bit hung up though on two things, and clarifications on both would help solidify my understanding of the design philosophy of the matchers. First, since CoroutineBodyStmt has a getBody() method, I assumed it would make it eligible for the hasBody matcher, although perhaps I am making an incorrect assumption/generalization here?

Second, the has() approach seems less accurate since we would be relying on the fact that the other children nodes of CoroutineBodyStmt (initial or final suspend point, etc) would never be a CompoundStmt or CXXTryStmt, else we might get unintentional matches. Or, one might miss the CXXTryStmt corner case.

Follow up question - should a need arise (say, authoring many clang-tidy checks that need extensive coroutine matcher support on the initial suspend point etc), would we see the matchers supporting things like coroutineBodyStmt(hasInitialSuspendPoint(...), hasFinalSuspendPoint(..)), or rely on a combination of has approach / non-ASTMatcher logic (or locally defined ASTMatchers within the clang-tidy library)?

It looks like this phab can be broken down into two changes - first, the addition of a coroutineBodyStmt matcher, and second, a hasBody traversal matcher for coroutineBodyStmt. Happy to separate these out depending on the direction of the discussion.

aaron.ballman added inline comments.Jan 10 2023, 8:23 AM
clang/include/clang/ASTMatchers/ASTMatchers.h
5498

I am a bit hung up though on two things, and clarifications on both would help solidify my understanding of the design philosophy of the matchers. First, since CoroutineBodyStmt has a getBody() method, I assumed it would make it eligible for the hasBody matcher, although perhaps I am making an incorrect assumption/generalization here?

As I understand the evolution here, we started out with hasBody() to match on function declaration AST nodes with a body. It then evolved to include do/while/for (and eventually range-based for) loops, which is a bit of an odd decision (why not anything with a substatement, like if or switch?). Now we're looking to add support for the coroutine body node, but that has a half dozen+ different things we can traverse to, one of which is the compound statement for the coroutine body. All of these cases are kind of different from one another, so the design is a bit confusing (at least to me).

Second, the has() approach seems less accurate since we would be relying on the fact that the other children nodes of CoroutineBodyStmt (initial or final suspend point, etc) would never be a CompoundStmt or CXXTryStmt, else we might get unintentional matches. Or, one might miss the CXXTryStmt corner case.

That's a fantastic point that I hadn't considered. We probably will want named traversal methods to go from the coroutine body statement to the other kinds of AST nodes it supports. And based on that, hasBody() is a quite reasonable name because that's one of the other kinds of AST nodes.

Follow up question - should a need arise (say, authoring many clang-tidy checks that need extensive coroutine matcher support on the initial suspend point etc), would we see the matchers supporting things like coroutineBodyStmt(hasInitialSuspendPoint(...), hasFinalSuspendPoint(..)), or rely on a combination of has approach / non-ASTMatcher logic (or locally defined ASTMatchers within the clang-tidy library)?

I'm leaning more towards introducing new traversal matchers. In terms of what goes into ASTMatchers.h, we usually only add matchers once there are multiple in-tree needs for the matcher, otherwise we recommend using a locally defined matcher. ASTMatchers.h is quite expensive to compile due to all the template shenanigans, and so we've resisted adding matchers unless they're "needed" to try to keep compile times reasonable.

Hmmm, I'm coming around to reusing hasBody() like you've done here. I think it's pretty weird to read: functionDecl(hasBody(coroutineBodyStmt(hasBody(stmt().bind(...))) because of the duplicated calls to hasBody, but I think it's more defensible now than I was originally thinking. I'm still curious if other folks who care about AST matchers have an opinion given that this is a bit trickier than usual.

Bump - anyone else have any thoughts here?

Hmmm, I'm coming around to reusing hasBody() like you've done here. I think it's pretty weird to read: functionDecl(hasBody(coroutineBodyStmt(hasBody(stmt().bind(...))) because of the duplicated calls to hasBody, but I think it's more defensible now than I was originally thinking. I'm still curious if other folks who care about AST matchers have an opinion given that this is a bit trickier than usual.

I think I'm okay with this direction, but one last try at pinging @klimek and @sammccall in case they've got an opinion. If we don't hear from them by Thur of this week, I'd say it's fine to move forward with the patch as-is.

clang/docs/ReleaseNotes.rst
871 ↗(On Diff #487789)

You should also mention that the hasBody matcher now works with coroutine bodies.

It seems pretty weird to me that the two child edges from functiondecl -> coroutinebodystmt -> compoundstmt are both called "body".

However that *is* what they're called in the AST today, and having the matchers correspond to the AST's names seems important.
So this seems OK to me (renaming the inner "body" edge to something else seems like a good idea, but is out of scope here).

ccotter updated this revision to Diff 500995.Feb 27 2023, 6:21 PM
  • update release note, rebase
ccotter updated this revision to Diff 500996.Feb 27 2023, 6:22 PM

fix bad 'arc diff'

ccotter marked an inline comment as done.Feb 27 2023, 6:24 PM
aaron.ballman accepted this revision.Feb 28 2023, 5:11 AM

LGTM! Thanks @sammccall for weighing in, I appreciate it. :-)

This revision is now accepted and ready to land.Feb 28 2023, 5:11 AM

Thanks both! Assuming this passes build, would one of you mind committing this for me? Chris Cotter <ccotter14@bloomberg.net>

This revision was landed with ongoing or failed builds.Mar 1 2023, 8:52 AM
This revision was automatically updated to reflect the committed changes.

Thanks both! Assuming this passes build, would one of you mind committing this for me? Chris Cotter <ccotter14@bloomberg.net>

Happy to do so! I've landed it on your behalf in dcf5ad89dcc5cc69b9df3e5dd9be71c65642f519