This is an archive of the discontinued LLVM Phabricator instance.

[LibTooling] Add support to Transformer for composing rules as an ordered choice.
ClosedPublic

Authored by ymandel on Apr 30 2019, 12:49 PM.

Details

Summary

This revision updates RewriteRule to support multiple subrules that are interpreted as an ordered-choice. With this feature, users can write the rules that appear later in the list of subrules knowing that previous rules' patterns *have not matched*, freeing them from reasoning about those cases in the current pattern.

Diff Detail

Repository
rC Clang

Event Timeline

ymandel created this revision.Apr 30 2019, 12:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2019, 12:49 PM
  • Could you provide a few real-world motivating examples? Especially interested in cases that were complicated before and are easy to do now?
  • Do we expect most the rules to be written using the new API or is this something that's gonna be used in 5-10% of the cases?
  • Could we consider expressing the CompositeRewriteRule as a RewriteRule itself? That might require extending the RewriteRule abstraction, but the mental model that makes sense to me is: a rewrite rule matches some AST nodes and applies replacements to them. A CompositeRewriteRule seems to match the same model, despite the fact it contains other rules. Would be nice if we could make this fact internal to the implementation.

Also a bit uneasy about the naming. Composite suggests the individual rules compose one after another, while in practice only one alternative is chosen.

  • Could you provide a few real-world motivating examples? Especially interested in cases that were complicated before and are easy to do now?

Sure, below are some examples based on actual tidies I've written (although not released). Let me know whether you think any of these should be expressed in the comments:

EXAMPLE 1
Consider a type T with a deterministic serialization function, serialize(). For performance reasons, we would like to make it non deterministic. Therefore, we want to drop the expectation that a.serialize() = b.serialize() iff a = b (although we’ll maintain deserialize(a.serialize()) = a).

Let’s assume this comes up in testing. We have three cases to consider:

EXPECT_EQ(a.serialize(), b.serialize());
EXPECT_EQ(a, b.serialize());
EXPECT_EQ(a.serialize(), b);

With ordered choice, you can encode the above directly into matchers and then compose them into an ordered choice rule. Without it, you’d have to write these as

EXPECT_EQ(a.serialize(), b.serialize());
EXPECT_EQ(a, b.serialize());  // where a != a’.serialize()
EXPECT_EQ(a.serialize(), b);  // where b != b’.serialize()

where the constraints in the comments are explicitly in the matchers.

EXAMPLE 2
Consider patterns of the form

absl::WrapUnique(e.release())

where e is an expression of type std::unique_ptr<...>. This pattern was necessary in gcc for certain circumstances relating to upcasting the value type of the unique_ptr. It is not necessary in clang, and so should be removed.

In some cases, it should be rewritten to e and sometimes to std::move(e), depending on the value category of e and the context (the expression of a return statement or not). That is, there are two cases:
absl::WrapUnique(e.release()) rewrites to e, if e is an r/x-value OR the call appears as a return expression,
absl::WrapUnique(e.release()) rewrites to std::move(e).
With ordered choice, the second case is written plainly (that is, just translate that expression to a matcher). Without it, you need to further express: unless e is an r/x-value OR the call appears as a return expression.

EXAMPLE 3
More generally, anywhere you’d use anyOf(m1.bind(“m1”), m2.bind(“m2”)) and then dispatch on those tags in your code to decide what to do, we’ll lift that behavior to the rule level, so you can write
makeOrderedRule({

makeRule(m1, action1), makeRule(m2, action2), …

});

  • Do we expect most the rules to be written using the new API or is this something that's gonna be used in 5-10% of the cases?

I'd expect higher than 10% but not the majority of cases. However, given that this semantics exactly matches that of anyOf, it may well be more common than I expect.

  • Could we consider expressing the CompositeRewriteRule as a RewriteRule itself? That might require extending the RewriteRule abstraction, but the mental model that makes sense to me is: a rewrite rule matches some AST nodes and applies replacements to them. A CompositeRewriteRule seems to match the same model, despite the fact it contains other rules. Would be nice if we could make this fact internal to the implementation.

Also a bit uneasy about the naming. Composite suggests the individual rules compose one after another, while in practice only one alternative is chosen.

Great points! I'm fine with collapsing the types into one. As you can see from line 260 in Transformer.h, we're already folding the single rule into the composite rule. As for naming, agreed, but does that concern drop away once we have only a single RewriteRule definition?

As for naming, agreed, but does that concern drop away once we have only a single RewriteRule definition?

Sure, that won't be an issue.

The use-cases make sense, thanks for the examples. Could you add one or two to the code? Would probably make sense to express them in code as matchers, rather than explaining what we're trying to do in a natural language.

Let's proceed with making RewriteRule the vocabulary type that we use for transformations like this if we both agree that's a good idea.
There are obviously multiple ways to tackle this, which one you had in mind?

As for naming, agreed, but does that concern drop away once we have only a single RewriteRule definition?

Sure, that won't be an issue.

The use-cases make sense, thanks for the examples. Could you add one or two to the code? Would probably make sense to express them in code as matchers, rather than explaining what we're trying to do in a natural language.

Sounds good, will do.

Let's proceed with making RewriteRule the vocabulary type that we use for transformations like this if we both agree that's a good idea.
There are obviously multiple ways to tackle this, which one you had in mind?

Something like this, although if you have a better name than RewriteAction I'm open to alternatives. e.g. RewriteCase?

struct RewriteAction {
  SmallVector<ASTEdit, 1> Edits;
  TextGenerator Explanation;
};

struct RewriteRule {
  ast_matchers::internal::DynTypedMatcher Matcher;
  std::vector<RewriteAction> Actions;
  static constexpr llvm::StringLiteral RootId = "___root___";
};

Or, nest the definition:

struct RewriteRule {
  struct Action {
    SmallVector<ASTEdit, 1> Edits;
    TextGenerator Explanation;
  };

  ast_matchers::internal::DynTypedMatcher Matcher;
  std::vector<Action> Actions;

  static constexpr llvm::StringLiteral RootId = "___root___";
};

Gentle ping...

Sorry for the delay.
I personally like the RewriteRule::Action best. Allows to use a rather common term, while still avoiding any possible confusion.

ymandel updated this revision to Diff 198811.May 9 2019, 6:40 AM

Folded CompositeRewriteRule into RewriteRule and added examples to comments.

Sorry for the delay.
I personally like the RewriteRule::Action best. Allows to use a rather common term, while still avoiding any possible confusion.

I tried going with RewriteRule::Action (it was my preference as well), but found that it didn't work well once I tried folding the composite rules into RewriteRule. The problem is that the Action structure forces you to eagerly join the matchers (which is what my first diff did). But then it doesn't work to join multiple rules that themselves were built with makeOrderedRule. Previously, this was impossible because they had different types, but now a user could conceivably do this. So, I changed the implementation build the joined matcher later. RewriteRule now just stores all of the (sub)rules in a single list, and makeOrderedRule is a trivial merge. The real work is done at registerMatchers()-time where we build the joined matcher for the whole rule. But, this means keeping the whole (sub)rule around, not just the Action part.

Alternatives I considered:

  1. We could use the Action structure inside Transformer -- instead of storing a RewriteRule, we could simply store a single matcher and vector of Action. In the constructor, we would call buildMatcher(Rule) and then copy only the action parts from the rules. However, I don't think this added complexity will gain us anything other than a structure which more faithfully reflects the control flow -- the matchers in each case are "dead" once we build the joined matcher, and the Action approach would reflect that.
  1. We could add more structure to Case, like:
struct RewriteRule {
  struct Action {
    SmallVector<ASTEdit, 1> Edits;
    TextGenerator Explanation;
  };
  struct Case {
    ast_matchers::internal::DynTypedMatcher Matcher;
    Action A;
  };
  std::vector<Cases> Cases;
};
ilya-biryukov added inline comments.May 14 2019, 8:06 AM
clang/include/clang/Tooling/Refactoring/Transformer.h
196 ↗(On Diff #198811)

All of the rules must use the same kind of matcher

Could you elaborate why we want this restriction?
Why they can't be of different kinds?

It feels ok to have transformations to completely unrelated nodes and still apply the first one.

224 ↗(On Diff #198811)

Why would we need braces around each call? Aren't they a rewrite rule in themselves?

232 ↗(On Diff #198811)

Any ideas for a better name? pickFirst or applyFirst?

240 ↗(On Diff #198811)

Could they be made private? If not, why?

247 ↗(On Diff #198811)

s/Action/Case? Or am I missing something?

clang/lib/Tooling/Refactoring/Transformer.cpp
182 ↗(On Diff #198811)

NIT: maybe name it isBaseOf? When talking about class hierarchies, using 'base' and 'derived' is a common convention.

250 ↗(On Diff #198811)

I wonder if there a better way to construct an anyOf matcher that can tell which case matched...
It looks to be the reason for a more complicated interface on the transformer side, but I'm not sure there's a way out of it.

Any ideas?

ymandel marked an inline comment as done.May 14 2019, 8:18 AM

Agreed on all the comments -- just one question:

clang/lib/Tooling/Refactoring/Transformer.cpp
250 ↗(On Diff #198811)

I don't quite follow. Which interface complication are you referring to? FWIW, the reason the code here doesn't just use the anyOf() matches is that it doesn't take a vector -- it only has a variadic form.

ymandel edited the summary of this revision. (Show Details)May 14 2019, 8:31 AM
ymandel updated this revision to Diff 199465.May 14 2019, 8:50 AM
ymandel edited the summary of this revision. (Show Details)

Response to comments.

ymandel marked 9 inline comments as done and an inline comment as not done.May 14 2019, 8:57 AM

Thanks for the review. PTAL.

clang/include/clang/Tooling/Refactoring/Transformer.h
196 ↗(On Diff #198811)

Agreed. The problem is purely from the implementation perspective. Since anyOf() enforces this restriction, I need to either

  1. pass it on to the user
  2. infer the base kind of each matcher and place them in the appropriate bucket.

I figured for a first pass, we'd go with 1. if you disagree, I can add the logic to bucket them and thereby support arbitrary combinations. FWIW, I think it's a desirable feature, so its not wasted work. its just a matter of splitting up the work.

224 ↗(On Diff #198811)

typo. thanks!

232 ↗(On Diff #198811)

Yes, those are both better. Went w/ applyFirst. Also considered applyFirstMatch but not sure that buys much clarity

240 ↗(On Diff #198811)

I tried to clarify this. But, I'm also fine splitting these three into a separate header file or moving them to the bottom of this one. They're only exposed at all because TransformerTidy lives in a different directory and namespace. WDYT?

The implementation LG, thanks! Just a few NITs and comments about the public interface and the comments

clang/include/clang/Tooling/Refactoring/Transformer.h
195 ↗(On Diff #199465)

NIT: Maybe shorten a bit? Something like

Applies the first rule whose pattern matches, other rules are ignored

The current version has a bit too many details, so it's hard to grasp on a first read.

215 ↗(On Diff #199465)

NIT: s/Ordered rules allow us/applyFirst allows to... ?

With a new name for the function, "ordered" rules can be confusing

231 ↗(On Diff #199465)

Maybe start with this? It's a great analogy. Something like this at the start of the comment would be great:

`applyFirst` is similar to `anyOf` matcher with a replace action attached to each of its cases...
196 ↗(On Diff #198811)

Ah, ok, so this is the restriction of anyOf.

232 ↗(On Diff #198811)

applyFirst looks good, thanks!

240 ↗(On Diff #198811)

I see. Maybe additionally put them into namespace detail?
A good way to make sure that any use is a red herring. (Otherwise clients would need to read the code to realize it's an internal function.

clang/lib/Tooling/Refactoring/Transformer.cpp
239 ↗(On Diff #199465)

NIT: remove redundant {}

262 ↗(On Diff #199465)

NIT:s/Finds the rule/Finds the case
Would help avoid potential confusion...

250 ↗(On Diff #198811)

E.g. the restriction that the matchers should have a common kind seems to come from the fact that we need to later find out which case matched.

It this a limitation of the AST matchers design? E.g. I can't match both a type and an expression and bind different subnodes in each submatcher, right?

ymandel updated this revision to Diff 199476.May 14 2019, 10:20 AM
ymandel marked 18 inline comments as done.

responded to comments.

ymandel added inline comments.May 14 2019, 10:22 AM
clang/include/clang/Tooling/Refactoring/Transformer.h
196 ↗(On Diff #198811)

indeed -- and all the other variadoc operations.

clang/lib/Tooling/Refactoring/Transformer.cpp
250 ↗(On Diff #198811)

E.g. the restriction that the matchers should have a common kind seems to come from the fact that we need to later find out which case matched.

It this a limitation of the AST matchers design? E.g. I can't match both a type and an expression and bind different subnodes in each submatcher, right?

Yes, as far as I can tell, but I don't think its connected to the binding -- every DynTypedMatcher needs to report what kind it supports. IIRC, this is connected to the desire to specialize matchers to types to provide some static checking for matchers. Since there is no universal/root AST type, there's no root kind, and we're stuck w/ this restriction. If we were willing for the Matcher class to diverge from the AST, we could add a "universal" node kind, but that's another discussion...

In practice, this is mitigated in matchers by forcing the user to call a different addMatcher for each kind. But, that won't work for us since we want to (ultimately) support rules of different (base) kinds.

LG, just a few NITs wrt to exposing implementation details in the header.

clang/include/clang/Tooling/Refactoring/Transformer.h
161 ↗(On Diff #199476)

NIT: maybe clarify what "ordered" means? E.g. "The first rule that matches gets applied and the rest are ignored"

278 ↗(On Diff #199476)

Can it be declared in .cpp file instead? Or is it used in clang-tidy integration?

282 ↗(On Diff #199476)

Same here, so far this looks like an implementation detail of Transformer, why not put it into .cpp file?

ilya-biryukov accepted this revision.May 16 2019, 9:20 AM
This revision is now accepted and ready to land.May 16 2019, 9:20 AM
ymandel updated this revision to Diff 199869.May 16 2019, 11:33 AM

adjust some API comments.

ymandel marked 5 inline comments as done.May 16 2019, 11:37 AM

Thanks!!

clang/include/clang/Tooling/Refactoring/Transformer.h
278 ↗(On Diff #199476)

buildMatcher and findSelectedCase will be used in the clang-tidy integration and in the apply-rule-to-single-node function that I'm planning.

282 ↗(On Diff #199476)

see above. I think these also make sense as part of the RewriteRule abstraction. that is, you can think of these as methods of RewriteRule and they make sense even without thinking about transformer. That said, if you think there's a way to make this clearer, i"m happy to adjust the comments/name, etc.

ilya-biryukov added inline comments.May 17 2019, 2:15 AM
clang/include/clang/Tooling/Refactoring/Transformer.h
278 ↗(On Diff #199476)

I'd say this makes these functions a public interface of rewrite rule, albeit it's an "advanced" use-case.
It's probably ok to keep them in detail namespace for now, but would be nice to come up with a proper public functions that allow us to implement those use-cases.
(Or declare these function public and well-supported and move them out of detail at some point)

ymandel updated this revision to Diff 200026.May 17 2019, 5:51 AM
ymandel marked 2 inline comments as done.

updated comments; moved some decls.

ymandel marked 2 inline comments as done.May 17 2019, 5:53 AM
ymandel added inline comments.
clang/include/clang/Tooling/Refactoring/Transformer.h
278 ↗(On Diff #199476)

Agreed. I noted this explicitly with a FIXME, reworded the comments to explicitly associate these with RewriteRule and moved them to immediately after the other RewriteRule functions. Transformer is now the last declaration in the file (and should probably be split into its own header at this point, being just one interpreter of RewriteRule among many).

ymandel updated this revision to Diff 200042.May 17 2019, 7:13 AM
ymandel marked an inline comment as done.

Synced to HEAD in preparation for committing.

This revision was automatically updated to reflect the committed changes.