This revision updates RewriteRule to support multiple subrules that are interpreted as an ordered-choice. With this feature, users can write the rules that appear later in the list of subrules knowing that previous rules' patterns *have not matched*, freeing them from reasoning about those cases in the current pattern.
Details
- Reviewers
ilya-biryukov - Commits
- rZORG200305468c9d: [LibTooling] Add support to Transformer for composing rules as an ordered…
rG200305468c9d: [LibTooling] Add support to Transformer for composing rules as an ordered…
rG8369a9beb7ed: [LibTooling] Add support to Transformer for composing rules as an ordered…
rL361037: [LibTooling] Add support to Transformer for composing rules as an ordered…
rC361037: [LibTooling] Add support to Transformer for composing rules as an ordered…
Diff Detail
- Repository
- rC Clang
Event Timeline
- Could you provide a few real-world motivating examples? Especially interested in cases that were complicated before and are easy to do now?
- Do we expect most the rules to be written using the new API or is this something that's gonna be used in 5-10% of the cases?
- Could we consider expressing the CompositeRewriteRule as a RewriteRule itself? That might require extending the RewriteRule abstraction, but the mental model that makes sense to me is: a rewrite rule matches some AST nodes and applies replacements to them. A CompositeRewriteRule seems to match the same model, despite the fact it contains other rules. Would be nice if we could make this fact internal to the implementation.
Also a bit uneasy about the naming. Composite suggests the individual rules compose one after another, while in practice only one alternative is chosen.
Sure, below are some examples based on actual tidies I've written (although not released). Let me know whether you think any of these should be expressed in the comments:
EXAMPLE 1
Consider a type T with a deterministic serialization function, serialize(). For performance reasons, we would like to make it non deterministic. Therefore, we want to drop the expectation that a.serialize() = b.serialize() iff a = b (although we’ll maintain deserialize(a.serialize()) = a).
Let’s assume this comes up in testing. We have three cases to consider:
EXPECT_EQ(a.serialize(), b.serialize()); EXPECT_EQ(a, b.serialize()); EXPECT_EQ(a.serialize(), b);
With ordered choice, you can encode the above directly into matchers and then compose them into an ordered choice rule. Without it, you’d have to write these as
EXPECT_EQ(a.serialize(), b.serialize()); EXPECT_EQ(a, b.serialize()); // where a != a’.serialize() EXPECT_EQ(a.serialize(), b); // where b != b’.serialize()
where the constraints in the comments are explicitly in the matchers.
EXAMPLE 2
Consider patterns of the form
absl::WrapUnique(e.release())
where e is an expression of type std::unique_ptr<...>. This pattern was necessary in gcc for certain circumstances relating to upcasting the value type of the unique_ptr. It is not necessary in clang, and so should be removed.
In some cases, it should be rewritten to e and sometimes to std::move(e), depending on the value category of e and the context (the expression of a return statement or not). That is, there are two cases:
absl::WrapUnique(e.release()) rewrites to e, if e is an r/x-value OR the call appears as a return expression,
absl::WrapUnique(e.release()) rewrites to std::move(e).
With ordered choice, the second case is written plainly (that is, just translate that expression to a matcher). Without it, you need to further express: unless e is an r/x-value OR the call appears as a return expression.
EXAMPLE 3
More generally, anywhere you’d use anyOf(m1.bind(“m1”), m2.bind(“m2”)) and then dispatch on those tags in your code to decide what to do, we’ll lift that behavior to the rule level, so you can write
makeOrderedRule({
makeRule(m1, action1), makeRule(m2, action2), …
});
- Do we expect most the rules to be written using the new API or is this something that's gonna be used in 5-10% of the cases?
I'd expect higher than 10% but not the majority of cases. However, given that this semantics exactly matches that of anyOf, it may well be more common than I expect.
- Could we consider expressing the CompositeRewriteRule as a RewriteRule itself? That might require extending the RewriteRule abstraction, but the mental model that makes sense to me is: a rewrite rule matches some AST nodes and applies replacements to them. A CompositeRewriteRule seems to match the same model, despite the fact it contains other rules. Would be nice if we could make this fact internal to the implementation.
Also a bit uneasy about the naming. Composite suggests the individual rules compose one after another, while in practice only one alternative is chosen.
Great points! I'm fine with collapsing the types into one. As you can see from line 260 in Transformer.h, we're already folding the single rule into the composite rule. As for naming, agreed, but does that concern drop away once we have only a single RewriteRule definition?
As for naming, agreed, but does that concern drop away once we have only a single RewriteRule definition?
Sure, that won't be an issue.
The use-cases make sense, thanks for the examples. Could you add one or two to the code? Would probably make sense to express them in code as matchers, rather than explaining what we're trying to do in a natural language.
Let's proceed with making RewriteRule the vocabulary type that we use for transformations like this if we both agree that's a good idea.
There are obviously multiple ways to tackle this, which one you had in mind?
Sounds good, will do.
Let's proceed with making RewriteRule the vocabulary type that we use for transformations like this if we both agree that's a good idea.
There are obviously multiple ways to tackle this, which one you had in mind?
Something like this, although if you have a better name than RewriteAction I'm open to alternatives. e.g. RewriteCase?
struct RewriteAction { SmallVector<ASTEdit, 1> Edits; TextGenerator Explanation; }; struct RewriteRule { ast_matchers::internal::DynTypedMatcher Matcher; std::vector<RewriteAction> Actions; static constexpr llvm::StringLiteral RootId = "___root___"; };
Or, nest the definition:
struct RewriteRule { struct Action { SmallVector<ASTEdit, 1> Edits; TextGenerator Explanation; }; ast_matchers::internal::DynTypedMatcher Matcher; std::vector<Action> Actions; static constexpr llvm::StringLiteral RootId = "___root___"; };
Sorry for the delay.
I personally like the RewriteRule::Action best. Allows to use a rather common term, while still avoiding any possible confusion.
I tried going with RewriteRule::Action (it was my preference as well), but found that it didn't work well once I tried folding the composite rules into RewriteRule. The problem is that the Action structure forces you to eagerly join the matchers (which is what my first diff did). But then it doesn't work to join multiple rules that themselves were built with makeOrderedRule. Previously, this was impossible because they had different types, but now a user could conceivably do this. So, I changed the implementation build the joined matcher later. RewriteRule now just stores all of the (sub)rules in a single list, and makeOrderedRule is a trivial merge. The real work is done at registerMatchers()-time where we build the joined matcher for the whole rule. But, this means keeping the whole (sub)rule around, not just the Action part.
Alternatives I considered:
- We could use the Action structure inside Transformer -- instead of storing a RewriteRule, we could simply store a single matcher and vector of Action. In the constructor, we would call buildMatcher(Rule) and then copy only the action parts from the rules. However, I don't think this added complexity will gain us anything other than a structure which more faithfully reflects the control flow -- the matchers in each case are "dead" once we build the joined matcher, and the Action approach would reflect that.
- We could add more structure to Case, like:
struct RewriteRule { struct Action { SmallVector<ASTEdit, 1> Edits; TextGenerator Explanation; }; struct Case { ast_matchers::internal::DynTypedMatcher Matcher; Action A; }; std::vector<Cases> Cases; };
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
196 ↗ | (On Diff #198811) |
Could you elaborate why we want this restriction? It feels ok to have transformations to completely unrelated nodes and still apply the first one. |
224 ↗ | (On Diff #198811) | Why would we need braces around each call? Aren't they a rewrite rule in themselves? |
232 ↗ | (On Diff #198811) | Any ideas for a better name? pickFirst or applyFirst? |
240 ↗ | (On Diff #198811) | Could they be made private? If not, why? |
247 ↗ | (On Diff #198811) | s/Action/Case? Or am I missing something? |
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
182 ↗ | (On Diff #198811) | NIT: maybe name it isBaseOf? When talking about class hierarchies, using 'base' and 'derived' is a common convention. |
250 ↗ | (On Diff #198811) | I wonder if there a better way to construct an anyOf matcher that can tell which case matched... Any ideas? |
Agreed on all the comments -- just one question:
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
---|---|---|
250 ↗ | (On Diff #198811) | I don't quite follow. Which interface complication are you referring to? FWIW, the reason the code here doesn't just use the anyOf() matches is that it doesn't take a vector -- it only has a variadic form. |
Thanks for the review. PTAL.
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
196 ↗ | (On Diff #198811) | Agreed. The problem is purely from the implementation perspective. Since anyOf() enforces this restriction, I need to either
I figured for a first pass, we'd go with 1. if you disagree, I can add the logic to bucket them and thereby support arbitrary combinations. FWIW, I think it's a desirable feature, so its not wasted work. its just a matter of splitting up the work. |
224 ↗ | (On Diff #198811) | typo. thanks! |
232 ↗ | (On Diff #198811) | Yes, those are both better. Went w/ applyFirst. Also considered applyFirstMatch but not sure that buys much clarity |
240 ↗ | (On Diff #198811) | I tried to clarify this. But, I'm also fine splitting these three into a separate header file or moving them to the bottom of this one. They're only exposed at all because TransformerTidy lives in a different directory and namespace. WDYT? |
The implementation LG, thanks! Just a few NITs and comments about the public interface and the comments
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
195 ↗ | (On Diff #199465) | NIT: Maybe shorten a bit? Something like Applies the first rule whose pattern matches, other rules are ignored The current version has a bit too many details, so it's hard to grasp on a first read. |
215 ↗ | (On Diff #199465) | NIT: s/Ordered rules allow us/applyFirst allows to... ? With a new name for the function, "ordered" rules can be confusing |
231 ↗ | (On Diff #199465) | Maybe start with this? It's a great analogy. Something like this at the start of the comment would be great: `applyFirst` is similar to `anyOf` matcher with a replace action attached to each of its cases... |
196 ↗ | (On Diff #198811) | Ah, ok, so this is the restriction of anyOf. |
232 ↗ | (On Diff #198811) | applyFirst looks good, thanks! |
240 ↗ | (On Diff #198811) | I see. Maybe additionally put them into namespace detail? |
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
239 ↗ | (On Diff #199465) | NIT: remove redundant {} |
262 ↗ | (On Diff #199465) | NIT:s/Finds the rule/Finds the case |
250 ↗ | (On Diff #198811) | E.g. the restriction that the matchers should have a common kind seems to come from the fact that we need to later find out which case matched. It this a limitation of the AST matchers design? E.g. I can't match both a type and an expression and bind different subnodes in each submatcher, right? |
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
196 ↗ | (On Diff #198811) | indeed -- and all the other variadoc operations. |
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
250 ↗ | (On Diff #198811) |
Yes, as far as I can tell, but I don't think its connected to the binding -- every DynTypedMatcher needs to report what kind it supports. IIRC, this is connected to the desire to specialize matchers to types to provide some static checking for matchers. Since there is no universal/root AST type, there's no root kind, and we're stuck w/ this restriction. If we were willing for the Matcher class to diverge from the AST, we could add a "universal" node kind, but that's another discussion... In practice, this is mitigated in matchers by forcing the user to call a different addMatcher for each kind. But, that won't work for us since we want to (ultimately) support rules of different (base) kinds. |
LG, just a few NITs wrt to exposing implementation details in the header.
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
161 ↗ | (On Diff #199476) | NIT: maybe clarify what "ordered" means? E.g. "The first rule that matches gets applied and the rest are ignored" |
278 ↗ | (On Diff #199476) | Can it be declared in .cpp file instead? Or is it used in clang-tidy integration? |
282 ↗ | (On Diff #199476) | Same here, so far this looks like an implementation detail of Transformer, why not put it into .cpp file? |
Thanks!!
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
278 ↗ | (On Diff #199476) | buildMatcher and findSelectedCase will be used in the clang-tidy integration and in the apply-rule-to-single-node function that I'm planning. |
282 ↗ | (On Diff #199476) | see above. I think these also make sense as part of the RewriteRule abstraction. that is, you can think of these as methods of RewriteRule and they make sense even without thinking about transformer. That said, if you think there's a way to make this clearer, i"m happy to adjust the comments/name, etc. |
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
278 ↗ | (On Diff #199476) | I'd say this makes these functions a public interface of rewrite rule, albeit it's an "advanced" use-case. |
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
278 ↗ | (On Diff #199476) | Agreed. I noted this explicitly with a FIXME, reworded the comments to explicitly associate these with RewriteRule and moved them to immediately after the other RewriteRule functions. Transformer is now the last declaration in the file (and should probably be split into its own header at this point, being just one interpreter of RewriteRule among many). |