Adds a basic version of Transformer, a library supporting the concise specification of clang-based source-to-source transformations. A full discussion of the end goal can be found on the cfe-dev list with subject "[RFC] Easier source-to-source transformations with clang tooling".
Details
- Reviewers
ilya-biryukov - Commits
- rGfdd98782aaab: [LibTooling] Add Transformer, a library for source-to-source transformations.
rC357768: [LibTooling] Add Transformer, a library for source-to-source transformations.
rL357768: [LibTooling] Add Transformer, a library for source-to-source transformations.
rGd5856302f7e3: [LibTooling] Add Transformer, a library for source-to-source transformations.
rL357576: [LibTooling] Add Transformer, a library for source-to-source transformations.
rC357576: [LibTooling] Add Transformer, a library for source-to-source transformations.
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
Buildable 29936 Build 29935: arc lint + arc unit
Event Timeline
Looks very polished, thanks!
Will have to sink the change in a bit, will come back with more comments on Monday.
In the meantime, a few initial comments and suggestions.
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
55 | Intuitively, it feels that any filtering should be possible at the level of the AST matchers. Is that not the case? Please note I have limited experience with AST matchers, so there might be some obvious things that I'm missing. | |
136 | Maybe consider separating the fluent API to build the rewrite rule from the rewrite rule itself? Not opposed to having the fluent builder API, this does look nice and seems to nicely align with the matcher APIs. class RewriteRule { public: RewriteRule(DynTypedMatcher, TextGenerator Replacement, TextGenerator Explanation); DynTypedMatcher matcher() const; Expected<string> replacement() const; Expected<string> explanation() const; }; struct RewriteRuleBuilder { // Having a better name than 'Builder' would be nice. RewriteRule finish() &&; // produce the final RewriteRule. template <typename T> RewriteRuleBuilder &change(const TypedNodeId<T> &Target, NodePart Part = NodePart::Node) &; RewriteRuleBuilder &replaceWith(TextGenerator Replacement) &; RewriteRuleBuilder &because(TextGenerator Explanation) &; private: RewriteRule RuleInProgress; }; RewriteRuleBuilder makeRewriteRule(); |
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
55 | Good point. The examples I have would actually be perfectly suited to a matcher. That said, there is not matcher support for a simple predicate of this form, along the lines of gtest's Truly(predicate). I'll remove this and separately try to add something like Truly to the matchers. | |
136 | I see your point, but do you think it might be enough to improve the comments on the class? My concern with a builder is the mental burden on the user of another concept (the builder) and the need for an extra .build() everywhere. To a lesser extent, I also don't love the cost of an extra copy, although I doubt it matters and I suppose we could support moves in the build method. |
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
136 | The issues with the builder interface is that it requires lots of boilerplate, which is hard to throw away when reading the code of the class. I agree that having a separate builder class is also annoying (more concepts, etc). Keeping them separate would be my personal preference, but if you'd prefer to keep it in the same class than maybe move the non-builder pieces to the top of the class and separate the builder methods with a comment.
I believe we can be as efficient (up to an extra move) with builders as without them. If most usages are of the form RewriteRule R = rewriteRule(...).change(...).replaceWith(...).because(...); class RewriteRuleBuilder { operator RewriteRule () &&; /// ... }; RewriteRuleBuilder rewriteRule(); void addRule(RewriteRule R); void clientCode() { addRule(rewriteRule().change(Matcher).replaceWith("foo").because("bar")); } |
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
136 | I didn't realize that implicit conversion functions are allowed. With that, I'm fine w/ splitting. Thanks! |
I've just realized the review is probably missing the latest revision.
@ymandel, could you upload the new version?
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
55 | Makes sense! Maybe put a FIXME here to let the readers know this is moving to the ast matchers? | |
58 | Note: I'm probably not seeing the latest version, so this comment might be outdated. Feel free to ignore if you've already moved this to the matchers. BTW, I was wondering whether MatchFilter carries its weight? What are the pros over: using MatchFilter = std::function<bool(const ast_matchers::MatchFinder::MatchResult &Result)>; The only thing thats come to mind is that default-constructed function cannot be called, which is less useful than returning true in a default-constructed. Do you have other reasons to have it in mind that I'm missing? | |
86 | What happens if member is composed of multiple tokens? E.g. foo.bar::baz; // member tokens are ['bar', '::', 'baz'] foo.template baz<int>; // member tokens are ['template', 'baz', '<', 'int', '>'] foo.operator +; // member tokens are ['operator', '+'] I might be misinterpreting the meaning of "member" token. | |
90 | Same here, what happens to the template arguments and multi-token names, e.g. | |
136 | Have you uploaded the new version? I don't seem to see the split. |
clang/unittests/Tooling/TransformerTest.cpp | ||
---|---|---|
158 | NIT: maybe consider adding overloads for these function that allow to pass char literals there? This is probably a common case, so it'd be nice to write this as: replaceWith("REPLACED").because("Use size() method directly on string."); |
Working on the new version now. Will note with "PTAL" once that's ready. Sorry that wasn't clear in earlier responses.
Got you! Waiting for the new revision.
Sorry that wasn't clear in earlier responses.
Or maybe it's me misreading earlier responses. Happy to review the new changes as soon as they land.
Split RewriteRule into two classes, remove support for where clause, support multi-token targets, and add corresponding tests.
Addressed the most major comments. Still working on some smaller ones. PTAL. Thanks!
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
55 | I've removed the where clause entirely. I'll separately add the matcher support (I've figured out how to do it within the existing framework). | |
86 | Good catch! I've updated to handle these correctly (I believe). Added some tests, plan to add some more. | |
90 | Good point. This seems difficult to get right, since NamedDecl does not carry sufficient loc data. However, I've updated the code to explicitly fail in such cases, so at least we won't have bad rewrites. BTW, any idea whether constructor initializers can ever be multi token? | |
136 | I have now. FWIW, I've left both ref overloads, but what do you think of dropping the lvalue-ref overloads and only supporting rvalue refs? Users can still pretty easily use an lvalue, just by inserting a trivial std::move() around the lvalue. | |
clang/unittests/Tooling/TransformerTest.cpp | ||
158 | Sure, will do in next revision. The version that's integrated w/ stencils does exactly this, btw. I wonder, though, whether we should add an implicit constructor to the TextGenerator class instead of adding overloads. I have a general distrust of implicit constructors, but I see that's not the case in the clang codebase. WDYT? |
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
90 |
Two cases come to mind:
Full example: namespace ns { struct X { X(int); }; } template <class ...Bases> struct Y : ns::X, Bases... { Y() : ns::X(10), Bases(10)... { } }; struct Z { Z(int); }; struct W { W(int); }; Y<Z, W> y; |
- Remove lvalue-ref overloads in builder
- add StringRef overloads for TextGenerator-taking methods
- constrain Name targeting for ctor initializers to explicitly written initializers.
- add multi-token member test
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
90 | Turns out the code was already filtering these cases. I added an addition constrain of I->isWritten() for initializers. So, only explicit initialization of fields is allowed, and therefore I'd venture guaranteed to be a single token. I noticed that Kythe seems to make the same assumption. That said, I could change to code to specify the range as a char-range of getMemberLoc(), getLParenLoc() if we can't rely on that (single-token) guarantee. |
Mostly nits and small API questions form my side.
This looks very good overall! I'm planning to take a closer look at the handling of macros and various AST nodes in more detail this week, but the approach looks solid from a higher level.
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
39 | I'm not sure if this is in the LLVM style guide, but we might want to avoid introducing these names into clang::tooling namespaces in the headers. My fear is that users will rely on those using without knowing that explicitly and won't add corresponding using directives or qualifiers to their .cpp files, making refactoring and moving the code around harder. Could you fully-qualify those names in the header instead? There does not seem to be too many of them. | |
66 | Why would a TextGenerator fail? | |
117 | NIT: maybe move all inits to the constructor? | |
136 | Yeah, LG, having only a single overload means less boilerplate and the fluent-builder APIs are mostly used exclusively as r-values anyway. | |
171 | NIT: maybe remove the =default copy and move ctors and assignments? | |
192 | NIT: maybe accept a std::string to let the clients pass a std::string if they already have it, avoiding extra copies in some cases? | |
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
29 | Other files in the tooling library seem to be adding using namespace clang instead of putting the declaration into a namespace. | |
31 | Why put using directives into an anonymous namespace? | |
45 | NIT: use UpperCamelCase for local variable names to conform to LLVM style. | |
111 | NIT: maybe add the corresponding assertion at the start of the function? | |
111 | s/node/Node | |
112 | NIT: consider merging verifyTarget into getTarget and making getTarget return Expected<CharSourceRange>. Also, having the invariants closer to the code using them makes it easier to certify both are correct, e.g. seeing that NamedDecl.isIdentifier() was checked before accessing the NamedDecl.getName() in the same function is simpler. | |
135 | NIT: start a fixme at a new line to make it more visible. | |
145 | This could be operator+ too, right? struct X { X operator+(int); void test() { X (X::*ptr)(int) = &X::operator+; } }; Would probably want to bail out in this case as well. | |
161 | The comment still mentions the where clause. A leftover from the previous version? | |
200 | Maybe use std::string in the public interface anyway?
| |
229 | NIT: remove braces around a single statement. |
Simplified TextGenerator, changed namespace handling, collapsed verifyTarget() into getTarget(), and other changes in response to comments.
Thanks for the detailed review and really helpful comments!
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
39 | right. I'd intended to introduce these into the clang tooling namespace -- that is, these weren't just convenience aliases for the header file. But, I no longer think that's useful in any case, so dropping them is certainly best. | |
66 | Sure. I could go either way. I think some of these cases fall on the border between an invariant violation and "invalid argument" or some such. But, let's keep it simpler for now. | |
117 | nicer, thanks. | |
171 | Sure. I was going based on google's style recommendations, but personally i prefer leaving them implicit. | |
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
29 | Done. Agreed about being consistent. FWIW, I can't say I like this style. Perhaps because I'm not used to it, but it feels too implicit. It forces the reader to figure out where each definition is being associated. Also, I discovered it only works for method definitions. Free functions still need to be explicitly namespaced. Any idea what the reason for this style is? | |
31 | You gain an extra little bit of robustness against clashing with something declared in the enclosing namespace. But, this is overkill even for me -- I generally only do this when I already have an anonymous namespace; there's no good reason to create one for this purpose. | |
112 | Yes, I like it much better this way. The split wasn't worth it. | |
135 | Done. | |
145 | The check E->getNameInfo().getName().isIdentifier() rules that out. It was hidden in verifyTarget(). I hope the new code makes this clearer? | |
161 | right. deleted the comment altogether. | |
200 | Agreed. Updated all relevant places. |
The only real question I have is about returning an error vs an empty transformation in of macros.
The rest are just NITs.
Thanks for the change! I'll get to the NodeId patch right away :-)
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
71 | NIT: maybe mention what it should be used for? we plan to show to to the user (e.g. in the clang-tidy fix description), right? | |
90 | Could you also add examples on how to apply the rewrite rule here? | |
98 | NIT: maybe assert this invariant in the constructor? | |
123 | This method does not seem to be used anywhere. | |
171 | +1. Means less boilterplate. | |
176 | NIT: the top-level std::move looks redundant, maybe remove it? | |
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
29 | I personally have a slight preference towards using namespace style myself because it produces compiler errors whenever declaration and definition signatures don't match, catching errors early in cases where one forgets to update either of those, e.g. // foo.h namespace clang{ std::unique_ptr<Something> createSomething(); } // foo1.cpp using namespace clang; std::unique_ptr<Something> clang::createSomething(int Param) { // <-- error } // foo2.cpp namespace clang { std::unique_ptr<Something> clang::createSomething(int Param) { // <-- all ok, will get linker errors on usages instead. } } There might be other reasons and I'm not sure why exactly the code in clang uses this approach. | |
32 | NIT: we could leave out the trailing :: for clang and llvm, they are well-known and unambiguous namespace names. | |
43 | NIT: could you add a brief comment on what this function is expected to return? | |
134 | NIT: remove namespaces for consistency with the rest of the code. (Probably a leftover from the previous version) | |
145 | Right, thanks! Moving the check into the code makes it more explicit now, thanks! | |
151 | NIT: consider simply propagating the original error up the stack in that case to avoid boilerplate. | |
158 | Why not return an error in case of macros too? Is there any use of the empty transformation to the clients? |
clang/unittests/Tooling/TransformerTest.cpp | ||
---|---|---|
62 | Maybe put the whole file into an anonymous namespace? | |
89 | NIT: remove redundant braces | |
160 | These three lines are in each test. Consider moving them to a helper function (e.g. returning the final result as text?) to reduce the boilerplate in the test. |
Thanks for (more) helpful comments. I think the code is a lot better now than it started out. :)
Also, converted RewriteRule to a simple struct as per our discussion.
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
71 | Done. But, given that we don't use this field yet, I'm fine deleting it until a future revision. I'm also fine leaving it given that we know we'll be needing it later. | |
90 | Is this what you had in mind? Unlike clang-tidy, there is no neat infrastructure into which we can drop it. | |
98 | The constructor sets this field, so no need to assert. | |
123 | It was used above in makeMatcher. But, it was overkill -- users can just reference RootId directly, so I've removed it. | |
176 | Yes, not sure why did that. thanks. | |
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
134 | This is necessary as far as I can tell. W/o it, I get a linker failure (missing definition). Based on the declaration in the header, the compiler resolves the reference below to clang::tooling::applyRewriteRule() but this definition ends up in the global namespace. I think the using decls only work for method definitions -- the type seems to resolve to be the one in the namespace. But free functions don't get the same treatment. Unless I"m doing something wrong? | |
151 | integrated id into getTarget so we can avoid this error handling step. | |
158 | I think the common case is that we simply want to skip macros -- there's nothing erroneous about them, they're just usually hard to deal w/ correctly. That said, we might want to change this to *never* return an error and just assert when things go wrong. The advantage of the current design is that callers can in principle ignore failed rewrites. However, in practice, if the rewrite fails that means it had a bug, so it might be best to asssert(). Only real advantage, then, is that this is easier to test since it doesn't crash the program. WDYT? | |
clang/unittests/Tooling/TransformerTest.cpp | ||
62 | also removed static qualifiers from functions since they are now redundant. | |
160 | Thanks, much cleaner this way. |
Looks neat, thanks!
A final set of NITs from my side, I don't think any of the comments have to do with the actual design of the library, so expect LGTM in the next round.
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
71 | Up to you, to me it feels like the presence of this field defines what this class is used for.
Both approaches make sense, and I assume (1) is the desired abstraction this class represents, so keeping the field looks ok. | |
83 | NIT: Maybe be more specific: string-based binding ids? | |
90 | Yeah, exactly, but could we keep is a bit shorter by removing the pieces which are non-relevant to the actual transformer usage. // To apply a rule to the clang AST, use Transformer class: // \code // AtomicChanges Changes; // Transformer T( // MyRule, [&Changes](const AtomicChange &C) { Changes.push_back(C); };); // ast_matchers::MatchFinder MatchFinder; // T.registerMatchers(&MatchFinder); // // ... insert code to run the ast matchers. // // Consume the changes. // applyAtomicChanges(..., Changes); Or just mention that Transformer class should be used to apply the rewrite rule and obtain the corresponding replacements. | |
126 | NIT: bound to this id | |
137 | NIT: Maybe remove the constructor and let the builder handle this? Up to you, of course. | |
159 | NIT: return by value to make the signatures less clunky. RVO should insure the generated code is the same in practice. | |
185 | Isn't this overload redundant in presence of buildRule(DynTypedMatcher)? Both seem to call into the constructor that accept a DynTypedMatcher, so Matcher<T> is convertible to DynTypedMatcher, right?. | |
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
134 | Yeah, you should explicitly qualify to let the compiler know the namespace of a corresponding declaration: Expected<Transformation> clang::tooling::applyRewriteRule(...) { // ... } tooling::applyRewriteRule would also work since we have using namespace clang::tooling; at the top of the file. | |
158 | The current approach LG, and we have the comments about those cases. I agree with you that handling macros is hard, so skipping them makes sense for most rewrite rules one would write and at the same time, it'd be useful to let the users of the code know that a rewrite rule could not be applied because of macros. | |
clang/unittests/Tooling/TransformerTest.cpp | ||
62 | LG. I'm personally happy either with or without static. |
Assorted changes in response to comments.
Most notably, dropped constructor from RewriteRule, in favor of putting meaningful initialization in the builder.
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
71 | Agreed. Keeping it. | |
90 | went w/ the second suggestion. | |
137 | Agreed, but DynTypedMatcher has no default constructor, so we have to provide something. I dropped the constructor, but added an initializer to Matcher to enable the default constructor. | |
185 | I think I was avoiding some extra construction, but even if so, I can't see its worth the added complexity. thx | |
clang/lib/Tooling/Refactoring/Transformer.cpp | ||
134 | ah... | |
158 | Agreed. If error handling was nicer, then I think that your suggestion would be ideal. | |
clang/unittests/Tooling/TransformerTest.cpp | ||
62 | static sounds good to me. |
Looks great, thanks!
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
110 | NIT: llvm::StringLiteral is a vocabulary type with compile-time size that could be used here. | |
113 | NIT: comma seems redundant, this should probably be: A fluent builder class |
Thank you for the extremely helpful and detailed review!!
clang/include/clang/Tooling/Refactoring/Transformer.h | ||
---|---|---|
110 | Went with StringLiteral -- given that we always wrap RootId in a stringref to use it, seems better just to define it that way to begin with. |
I've removed the dependency on NodeIds given our (off thread) discussions regarding their overall value and appropriateness in clang::tooling vs clang::ast_matchers. PTAL.
LGTM with the new changes. Specifying the clang::Expr type explicitly when calling change looks a bit less clear than the original approach, but also looks pretty clear to me.
Patch breaks the build with the shared libraries, for example, http://lab.llvm.org:8011/builders/clang-ppc64le-linux-multistage/builds/9498. Seems to me, you need to add an extra dependency on clangTooling. clangToolingCore is not enough.
You need to provide -DBUILD_SHARED_LIBS=true to cmake. Unfortunately, there is a dependency from the libToolingRefactor to libTooling. I don't think it will be easy to fix this.
Sever dependency of clangToolingRefactor on clangTooling via FixIt.
Transformer used FixIt, which causes a problematic dependency. This diff copies
the (minimal) code from FixIt to Transformer.cpp to break the dependency. For
the future, though, we should consider whether the FixIt library should live
somewhere accessible to components in Refactoring (perhaps Core?).
Given that we don't any usages of getExtendedRange, could you remove it from the Fixit.h and move to a header somewhere in the clangToolingRefactoring (e.g. into Transformer.h itself)?
Having two copies of this functions in the codebase does not look good, especially given that one of them is never used anyway.
Keeping it only in the cpp file also LG if we don't plan to have other usages for now, but please remove the version from FixIt.h
The Stencil library will need this as well and I've heard from another user who's already using this (a standalone tool, not in clang), so it seems to have a general value. Might the fixit library belong in Core? I have a whole bunch more utility functions coming along, so we need a place for them as well.
I propose that I create a new library in Core with getExtendedRange() and remove it from FixIt. The other utility functions that I need will also go there. We can separately investigate moving the remaining pieces of FixIt into Core.
Moving this to Core makes sense to me, the only concern that I have is that it's actually a small and focused library as it stands today and adding utility functions to it would seem to cause some chaos.
But it's fine as long as the maintainers of the library are ok with this. I'll ask around to find out who's actually responsible for the library and get back to you...
Per @ioeric's suggestion: why not move the helper into Tooling/Refactoring/ExtendedRange.h?
If it's in ToolingRefactoring, both stencil and transformer can access it.
For external users, a dependency on either ToolingCore or ToolingRefactoring should be fine, since they're fine with a dependency on Tooling already.
This sounds perfect. Can I do the same for the future additions -- small, focused libraries for each group of functions? I just want to be sure that we don't regret the name "ExtendedRange" when I need to add the next batch.
In clangd, we have a library called SourceCode.h that keeps source code related helpers including those that deal with ranges. I think it's reasonable to use SourceCode.h or something similar here.
I'm not sure if this is in the LLVM style guide, but we might want to avoid introducing these names into clang::tooling namespaces in the headers.
My fear is that users will rely on those using without knowing that explicitly and won't add corresponding using directives or qualifiers to their .cpp files, making refactoring and moving the code around harder.
Could you fully-qualify those names in the header instead? There does not seem to be too many of them.