- Only works for extraction from free functions
- Basic analysis of the code being extracted.
- Extract to void function
- Bail out if extracting a return, continue or break.
- Doesn't hoist decls yet
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
- Build Status
- Buildable 35902 - Build 35901: arc lint + arc unit 
Event Timeline
I guess you're looking for design review at this point.
But the most important part of that is the documentation of the high-level structure, which is entirely missing. I can't really infer it from the code because most of the design is (presumably) not implemented at this point :-)
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 36 | I think "range" or "region" may be a clearer name here than "target". | |
| 110 | what is the fix you're waiting for? | |
| 155 | These are important domain structures, can we move them out of the RecursiveASTVisitor? (which should be mostly restricted to managing state of the traversal itself) | |
| 156 | You've called this "reference", but I think it's actually a "referenceddecl" - you don't store one of these for each reference do you? | |
| 189 | why does D need to be a vardecl? | |
| 190 | please don't add messages to asserts that just echo the code. In order of preference: 
 | |
| 197 | why is this handled specially, instead of being part of the general case of "referring to a decl not visible in the target scope"? (in particular, if the context is forward-declared elsewhere, what's the problem? | |
| 203 | DRE is just one of the cases where we want to handle names. I think it'll be easier to handle more cases by changing this to accept a Decl* and moving the DRE->getDecl() to the caller | |
| 207 | (why beginloc rather than loc, throughout?) | |
| 211 | this line seems very suspect, what are you trying to do and why? | |
| 214 | can't we skip the indirection here and just have something like Ref.markOccurrence(DRE->getBeginLoc()), and avoid dealing with the enum here at all? | |
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 211 | Definitely needed, apart from the lack of clarity the current version has UB sometimes (DeclLocType + 1 might not be a valid value) But by "why" i mostly mean - this looks like the place for recording the location, not the place for deciding not to record the location because some algorithm later may not need it. Seems a lot simpler to record it unconditionally and worry about it later. | |
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 69 | Could you move the definition of RootStmt to class definition? Since it is used in a lot more places than just this one. | |
| 69 | 
 Can you also mention the reasoning/implications ? | |
| 72 | Can you make this and the computeEnclosingFunction{,Rng} free functions instead of static methods? | |
| 75 | haven't checked the implementation yet, but naming and the comments seem to be ambiguous. Do you mean *selected* childrens of Parent, instead of *all*? Because IIUC, Source refers to the part of the function body that is selected, not all of it, right? | |
| 223 | comments for some of the fields like SemicolonPolicy would be nice. also most of these looks like implementation details, consider moving internal ones to private | |
| 287 | could you give more details on what is collected and how? | |
| 361 | why not move this and 4 following functions into astvisitor? | |
Main themes are:
- there are a few places we can reduce scope for the first patch: e.g. template support
- internally, it'd be clearer to pass data around in structs and avoid complex classes unless the abstraction is important
- high-level documentation would aid in understanding: for each concept/function: what is the goal, how does it fit into higher-level goals, are there any non-obvious reasons it's done this way
- naming questions (this can solve some of the same problems as docs, but much more cheaply)
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 8 | this file will need an overview describing 
 | |
| 38 | this name is *also* pretty confusing because of SourceLocation, SourceRange etc classes. Best option might just be to make up something distinctive, like Zone :-/ | |
| 40 | nit: if you're going to document the values one-by-one, do it in a comment on each value instead | |
| 43 | nit: we don't use MACROCASE for enums in LLVM (I think). This of course makes namespacing more important, we should consider enum class here. I'd probably suggest enum class ZoneRelative { // or some name consistent with above
  Before,
  After,
  Inside,
  External
}; | |
| 46 | This looks like basically a struct (mostly public data members) where all the logic is in the constructor. | |
| 46 | nit: not sure view adds anything here. ExtractionSource or ExtractionZone? | |
| 51 | classify? | |
| 60 | when is this not Parent->Children.back()? | |
| 94 | I think you probably want to avoid extracting from lambdas and templates for now, too. | |
| 105 | nit: Please spell 'range' rather than 'rng' in function names, as it's not a standard abbreviation in clang[d] | |
| 108 | can we avoid adding template support in the first patch? It's complicated, and I don't think the code below handles it robustly. For instance, this case will be extracted (despite being dependent), because the reference to T is not seen by the ast visitor, and the type of the decls that are seen are not dependent: template <typename T>
int zero() { return std::vector<T>().size(); } | |
| 114 | It's not clear what this function does and why. Can you add a high-level comment? | |
| 125 | LLVM_FALLTHROUGH (we support other host compilers: msvc, gcc) | |
| 145 | what needs to be fixed, and why is it important? | |
| 176 | what is this function doing, and why? it seems like we're trying to determine that the selection consists of either a single completely-selected stmt, or a bunch of completely-selected statements with a common parent, and then we represent them as "all the children of a certain parent node". Splitting the logic into two distant functions, and not documenting this intent, makes it hard to follow. | |
| 207 | as above, this seems like it would be clearer as a struct. | |
| 215 | I'd suggest capturing the type as a QualType instead of a string + const + ref flag When types may need to be re-spelled, we'll need that extra information - the context needed to re-spell is available at render() time, not addParameter() time. | |
| 226 | We tend to avoid std::set except in the rare cases where we need both sorting and on-the-fly deduplication. Here, it seems std::vector would be enough, and explicitly sorting an the end would make the intent clearer. | |
| 227 | Hoisting isn't supported, drop this? | |
| 236 | as previously discussed, there's not enough (conceptual or implementation) commonality between the call and the declaration, can we have separate renderDefinition() and renderCall() functions? | |
| 263 | these TODOs don't really describe the structural change that's needed here: we need a way to apply a bunch of edits to the function text (in terms of the original coordinates). | |
| 401 | I'm confused about the decls to host part - this isn't actually supported right? Take it out of the name? Or are they detected in order to cause the refactoring to fail? | |
| 455 | you've got a return type, a comment, and a function name, and they all say the same thing. Can you explain (at a high level) what this is for? | |
| clang-tools-extra/clangd/unittests/TweakTests.cpp | ||
| 603 ↗ | (On Diff #215615) | the test helpers are really aimed at testing prepare/apply together rather than separately. These EXPECT_AVAILABLE could be EXPECT_EQ(apply(), ...). If prepare() returns false it'll return "prepare failed" or some string like that. | 
| 618 ↗ | (On Diff #215615) | rather than checking in commented-out tests, I'd prefer to check in the tests verifying current incorrect behavior with a FIXME. (or just leave the FIXME by itself) | 
| 639 ↗ | (On Diff #215615) | maybe add a comment saying why not? | 
| 649 ↗ | (On Diff #215615) | this line does nothing? | 
| 651 ↗ | (On Diff #215615) | IIRC you can't use raw strings inside a macro (it's legal, but one of the supported compilers chokes on it) | 
| 654 ↗ | (On Diff #215615) | please avoid large tests that test lots of things, test them one-at-a-time instead. The intent is clearer, and it's clearer what failures mean. | 
| 683 ↗ | (On Diff #215615) | please write these out explicitly instead. | 
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 361 | Ignore. | |
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 30 | did you mean no return *type* ? | |
| 31 | don't understand what you meant here | |
| 32 | could you elaborate on that one? You can ignore this comment if it is gonna be explained in the design doc(and you are planning to attach it before landing this revision) | |
| 43 | Tried hard to figure out what RAV is :D could you rather say RecursiveASTVisitor(RAV) ? | |
| 123 | IIRC, < operator in SourceLocations are merely for implementing hashing and doesn't really carry less than semantics. Could you rather use SM.isBeforeInSLocAddrSpace? | |
| 125 | same here | |
| 134 | Can you rather return a FunctionDecl ? | |
| 153 | maybe rather return an llvm::Optional/Expected ? | |
| 166 | I suppose this relies on the fact that "AST contains the nodes ordered by their begin location"? Could you add an assertion for that? | |
| 177 | what makes this fail-free ? | |
| 181 | This explains *what* the function is doing, but lacks the *reason why* Could you explain the reason either in here or in the call site? | |
| 182 | hasOnlyRootStmtsAsChildren ? | |
| 216 | why ? | |
| 226 | why is this function returning a shared_ptr ? | |
| 333 | why don't we infer this const from QualType ? | |
| 348 | what does numbering mean ? | |
| 481 | nit: move this to top of the loop | |
| 520 | maybe return an llvm::Expected so that we can tell user about the reason why refactoring failed? | |
| 548 | why do you need a shared_ptr here? | |
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 47 | please fix or remove comment | |
| 89 | nit: "wrt to" -> wrt | |
| 105 | remove from struct and pass as a parameter? | |
| 113 | document why this is important, or just inline it? | |
| 114 | don't bother handling the null parent case. When can that happen? We should avoid creating an ExtractionZone struct entirely then. Also none of the callsites bother to check for null... | |
| 133 | nit: consistently one blank line between functions (in this file you alternate between 1/0) | |
| 158 | why are you doing this with a loop, isn't it just {first->begin,last->end}? | |
| 182 | nit: I think this would be clearer as | |
| 226 | avoid shared_ptr unless there's a strong reason. Optional<ExtractionZone> seems fine here? | |
| 258 | there's no need for the WithTypeAndQualifiers=false version, that's just "fn.name" | |
| 271 | this is just initializing public fields, drop the constructor? | |
| 281 | this is just push_back, inline into caller | |
| 286 | again, avoid sharing code between the complicated declaration case, and the trivial callsite case. It makes changes harder and obscures the hard case. | |
| 327 | use printType from AST.h? (You'll want to drop qualifiers/refs before calling that, but it's not at all obvious from the function name here that they're dropped, so that should be at the callsite anyway) | |
| 347 | missing initializers on this and DeclIndex | |
| 348 | Giving a definition of "index" isn't useful here :-) | |
| 353 | what is this constructor for? | |
| 376 | Again, the comment just echoes the code. There is some subtlety here, either explain it ("the index is the size of the map so far, so it increases for each decl found") or ignore it - this isn't too hard to work out. | |
| 411 | please don't add these NOLINT comments everywhere. It's too noisy and we don't do it elsewhere (agree we should silence clang-tidy somehow) | |
| 431 | computing and checking bounds for every node in the same function seems unneccesarily inefficient. Why not override TraverseStmt to set a flag to true when the stmt is a RootStmt, call Base::TraverseStmt then set it to false again after? | |
| 456 | seems you could write this more directly - e.g. it's natural to check for hoisting here. if (DeclaredIn == Inside && ReferencedInPostZone) return false; // needs to be hoisted, not supported if (!ReferencedInZone) continue; // no need to pass as parameter, not referenced if (DeclaredIn == Inside || DeclaredIn == OutsideFunc) continue; // no need to pass as parameter, still accessible | |
| clang-tools-extra/clangd/unittests/TweakTests.cpp | ||
| 611 ↗ | (On Diff #217108) | wait, what? | 
| 615 ↗ | (On Diff #217108) | please clarify this comment (specifically what needs to be fixed?) ideally add the broken test case with a FIXME to fix it | 
| 621 ↗ | (On Diff #217108) | add a more realistic example? I can easily imagine banning this case in the future (extraction of an empty block) | 
| 641 ↗ | (On Diff #217108) | please make the tests use reasonably conventional style - FOO -> Foo, one statement per line, etc | 
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 166 | I've removed the loop, should I still add this? | |
| 182 | Replaced with isRootStmt | |
| 226 | And store a unique_ptr to the optional in ExtractFunction? | |
| 226 | Because ExtractFunction needs to store a pointer/reference to ExtractionZone somehow in prepare and access it in apply. | |
| 271 | ExtractionSemicolonPolicy doesn't have a default constructor. | |
| 333 | In a future patch we will want to use heuristics like has the variable been assigned, was it passed as a non-const reference, was its address taken, etc. | |
| 548 | getExtractionZone creates an Optional<ExtractionZone> which needs to persist from prepare to apply. Is there a better way to store a reference to the ExtractionZone instance inside ExtractFunction? | |
| clang-tools-extra/clangd/unittests/TweakTests.cpp | ||
| 611 ↗ | (On Diff #217108) | For now we don't extract if the selection is an expression. I've added a fixme to change that to sub-expressions only. | 
I think we should go ahead and land this. The only points that I'd really like to see fixed is shared_ptr, mostly because it's a strong signal there's something complicated going on and I don't think (or hope) there is!
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 120 | D66872 has the fix if you'd rather avoid these workarounds | |
| 147 | llvm::DenseSet<const Stmt*> std::set is a pretty terrible data structure unless you really really need order. | |
| 239 | this is also: for (const SelectionTree::Node *N) ExtZone->RootStmts.insert(N->ASTNode.get<Stmt>()); which is shorter and (I think) less obfuscated | |
| 327 | Second half is not done: the suggestion is that it's really confusing where spellType is called that it drops qualifiers, so just I'd inline this into the callsite. | |
| 333 | I think the point is that the QualType in parameter could/should represent the *parameter* type, not the type of the variable being captured. This seems the more natural model (the struct is called Parameter, not CapturedVariable or Argument) but there may be reasons not to do this. | |
| 548 | shared ownership is less common and harder to reason about than exclusive ownership Generally we prefer deciding where ownership should reside (T or Optional<T> or unique_ptr<T>), and where you should have an unowned reference (T& or T*). In this case, findExtractionZone() should ideally return Optional<ExtractionZone>, after prepare() the ExtractFunction class should own it (as a Optional<ExtractionZone>), and apply should pass it around as a const ExtractionZone&. If it turns out ExtractionZone isn't a movable type, then Optional<ExtractionZone> won't work and you'll need unique_ptr<ExtractionZone> instead. But shared_ptr shouldn't be needed regardless. | |
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 147 | lifetime of this field looks complicated can you add some comments on how/when it is initialized and maybe enforce immutability ? | |
| 151 | it seems like you are rather using it to decide whether a statement is inside the zone or not? Could you rather rename it to reflect that and add some comments? | |
| 166 | no need | |
| 170 | nit: if (isa<CXXMethodDecl>(Func)) | |
| 173 | nit: if(isa<FunctionTemplateDecl>(Func)) | |
| 226 | Let's figure it out in the ExtractFunction this can still return an Optional<ExtractionZone> which you can convert into a pointer/reference later on if need be? | |
| 347 | i think you mean index of the first reference to this decl ? | |
| 487 | nit isa<FunctionDecl> instead of dyn_cast | |
| 548 | Then why not just store Optional<ExtractionZone> ? | |
Addressed review comments
| clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp | ||
|---|---|---|
| 120 | Will remove this once D66872 is committed. | |
| 151 | if extracting for(;;) int x = 0; the DeclStmt is inside the zone but not a rootstmt. | |
| 173 | Using Func->isTemplated() | |
| 347 | It can be the index of declaration or the first reference. Since the visitor Visits Decls as well, we will encounter a reference before the Decl only if the Decl is outside the enclosing function. | |
this file will need an overview describing