Enable default caching of actionsin MatchSwitch so that for each
unique statement matchers are invoked at most once.
Details
- Reviewers
ymandel xazax.hun gribozavr2
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
What is the motivation for stashing the results of a match on a statement? Do we expect to encounter the same statement often?
I thought the concern was more specific to re-matching particular types, like std::optional. For that, we could lazily store the declaration nodes (like we do in https://github.com/llvm/llvm-project/blob/main/clang/lib/Analysis/FlowSensitive/Models/ChromiumCheckModel.cpp with CheckDecls). But, I'm less clear on the impact of storing the result of the match itself.
For the decls concern, I think we'll want evidence of the cost before we try to optimize it. The hasName matcher is already optimized, so I'm not sure how much we have to gain here. Something, but probably not a lot.
clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h | ||
---|---|---|
159 | What is this function doing? Please add a comment. Also, I don't understand how it compiles -- the return type is std::function<void(State &)> but it returns a lambda of type (State &S) -> std::function<void(State &)>? |
The matcher is evaluated for each fixpoint iteration so I expect that we'll encounter the same statement as many times as there are iterations over the same basic block. Does that make sense?
I thought the concern was more specific to re-matching particular types, like std::optional. For that, we could lazily store the declaration nodes (like we do in https://github.com/llvm/llvm-project/blob/main/clang/lib/Analysis/FlowSensitive/Models/ChromiumCheckModel.cpp with CheckDecls). But, I'm less clear on the impact of storing the result of the match itself.
I was thinking that maybe we can split the problem in two:
- duplicated work across matchDynamic calls, and
- duplicated work within a single matchDynamic call.
This patch addresses 1. as I see it as responsibility of MatchSwitch. Your comment seems to be about 2., if I'm reading it correctly. Do you think it's possible and makes sense to solve it in the AST matcher framework? Would that address your concerns?
clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h | ||
---|---|---|
159 | The outer lambda was a leftover. I removed it and added a comment. |
Much clearer now, thanks for the changes! The only real question I have is what's the best design for the cache -- lambda or explicit pair. I leave it up to you.
Yes, it does! Thanks for clarifying.
I thought the concern was more specific to re-matching particular types, like std::optional. For that, we could lazily store the declaration nodes (like we do in https://github.com/llvm/llvm-project/blob/main/clang/lib/Analysis/FlowSensitive/Models/ChromiumCheckModel.cpp with CheckDecls). But, I'm less clear on the impact of storing the result of the match itself.
I was thinking that maybe we can split the problem in two:
- duplicated work across matchDynamic calls, and
- duplicated work within a single matchDynamic call.
This patch addresses 1. as I see it as responsibility of MatchSwitch. Your comment seems to be about 2., if I'm reading it correctly.
Right. This division makes sense, particularly the point about responsibility. Point 2 is very much independent of MatchSwitch. But, I think matching the optional type is about both really in that every time we call the matcher we'll be checking the optional type, even if we only check it once *within* a single match. Still, fixing it doen't belong in MatchSwitch.
Do you think it's possible and makes sense to solve it in the AST matcher framework? Would that address your concerns?
Interesting idea -- basically, we'd need to build a caching mechanism into matchers based on names (identity really). That might make sense in general -- it's certainly a general concern -- but I'd be more inclined to try to fix it locally for now. That said, before we go there I'd want to profile the code, so no rush on this regardless.
clang/include/clang/Analysis/FlowSensitive/MatchSwitch.h | ||
---|---|---|
162 | nit: maybe replace Matcher with Action or SpecializedAction? | |
176–180 | Given that we only use Actions and Index in Actions[Index], what about only stashing that? Similarly, for MatchResult? [&Action = Actions[Index], Result = ast_matchers::MatchFinder::MatchResult(Results[0], &Context), &Stmt] ... Alternatively, cache the pair of Action, Result directly instead of caching a lambda. This saves the path through a std::function and might make the code more readable (although, that's a matter of taste). |
Thanks! Did you have a chance whether this makes a difference in real world scenarios? I'm mostly curious because I do not have a good mental model of how the matchers are implemented, specifically what optimizations are in place, so I don't really know how much of an impact can caching make :)
Once the rest of the comments are resolved this looks good to me.
+1, I like how the problem was split. I also agree that MatchSwitch might not be the best place to fix #2 as a more broad fix would potentially benefit other parts of Clang, like Tidy. But speaking of other parts of Clang, I wonder whether MatchSwitch is something that could be part of the ASTMatcher library. Some people might find it useful for implementing tools other than flow sensitive checks. Although, I do not have a strong preference, I'm completely fine leaving it where it currently is.
In general, for clang-tidy style use, we've found that building the AST dwarfs the cost of the matchers, so there's limited room for performance improvements from the matchers themselves. That said, the hasName matcher is optimized in the case of qualified names, and I expect that was motivated by noticable performance costs from the easy version of just printing the fully qualified name of the type and comparing the strings.
The complication from caching is that it can only be done when we're sure that the name is unique. So, either its a fully qualified name, or the user (of the matcher) tells us explicitly that it can be cached. That might make the hasName interface a little clunkier, but maybe its worth it.
Once the rest of the comments are resolved this looks good to me.
+1, I like how the problem was split. I also agree that MatchSwitch might not be the best place to fix #2 as a more broad fix would potentially benefit other parts of Clang, like Tidy. But speaking of other parts of Clang, I wonder whether MatchSwitch is something that could be part of the ASTMatcher library. Some people might find it useful for implementing tools other than flow sensitive checks. Although, I do not have a strong preference, I'm completely fine leaving it where it currently is.
I'll check w/ Aaron Ballman, one of the main contributors/reviewers for matchers these days.
What is this function doing? Please add a comment. Also, I don't understand how it compiles -- the return type is std::function<void(State &)> but it returns a lambda of type (State &S) -> std::function<void(State &)>?