This is an archive of the discontinued LLVM Phabricator instance.

[Clangd] First version of ExtractFunction
ClosedPublic

Authored by SureYeaah on Jul 31 2019, 10:43 AM.

Details

Summary
  • Only works for extraction from free functions
  • Basic analysis of the code being extracted.
  • Extract to void function
  • Bail out if extracting a return, continue or break.
  • Doesn't hoist decls yet

Diff Detail

Repository
rL LLVM

Event Timeline

SureYeaah created this revision.Jul 31 2019, 10:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2019, 10:43 AM

I guess you're looking for design review at this point.

But the most important part of that is the documentation of the high-level structure, which is entirely missing. I can't really infer it from the code because most of the design is (presumably) not implemented at this point :-)

clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
36 ↗(On Diff #212625)

I think "range" or "region" may be a clearer name here than "target".
(In extract variable, target was used to refer to the destination of extraction)

110 ↗(On Diff #212625)

what is the fix you're waiting for?

155 ↗(On Diff #212625)

These are important domain structures, can we move them out of the RecursiveASTVisitor? (which should be mostly restricted to managing state of the traversal itself)

156 ↗(On Diff #212625)

You've called this "reference", but I think it's actually a "referenceddecl" - you don't store one of these for each reference do you?

189 ↗(On Diff #212625)

why does D need to be a vardecl?

190 ↗(On Diff #212625)

please don't add messages to asserts that just echo the code. In order of preference:

  • take a reference so it clearly can't be null
  • add a message explaining at a *high level* what variant is violated
  • add the assertion with no message
197 ↗(On Diff #212625)

why is this handled specially, instead of being part of the general case of "referring to a decl not visible in the target scope"?

(in particular, if the context is forward-declared elsewhere, what's the problem?

203 ↗(On Diff #212625)

DRE is just one of the cases where we want to handle names.

I think it'll be easier to handle more cases by changing this to accept a Decl* and moving the DRE->getDecl() to the caller

207 ↗(On Diff #212625)

(why beginloc rather than loc, throughout?)

211 ↗(On Diff #212625)

this line seems very suspect, what are you trying to do and why?

214 ↗(On Diff #212625)

can't we skip the indirection here and just have something like Ref.markOccurrence(DRE->getBeginLoc()), and avoid dealing with the enum here at all?

SureYeaah marked 2 inline comments as done.Aug 2 2019, 6:58 AM

I guess you're looking for design review at this point.

But the most important part of that is the documentation of the high-level structure, which is entirely missing. I can't really infer it from the code because most of the design is (presumably) not implemented at this point :-)

Aaah. Yes. Sorry. Makes sense.

clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
110 ↗(On Diff #212625)

The semicolon fix that has already landed now.

211 ↗(On Diff #212625)

This just checks whether (DeclLocType, DRELocType) is either (PRETARGET, TARGET) or (TARGET, POSTTARGET). Could explicitly write it if needed.

sammccall added inline comments.Aug 2 2019, 7:06 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
211 ↗(On Diff #212625)

Definitely needed, apart from the lack of clarity the current version has UB sometimes (DeclLocType + 1 might not be a valid value)

But by "why" i mostly mean - this looks like the place for recording the location, not the place for deciding not to record the location because some algorithm later may not need it. Seems a lot simpler to record it unconditionally and worry about it later.

SureYeaah updated this revision to Diff 215077.Aug 14 2019, 4:00 AM
SureYeaah marked 11 inline comments as done.

Refactored design and added unit tests.

SureYeaah updated this revision to Diff 215089.Aug 14 2019, 5:37 AM

Removed unrelated changes

SureYeaah updated this revision to Diff 215098.Aug 14 2019, 5:57 AM

Removed debug info

SureYeaah updated this revision to Diff 215131.Aug 14 2019, 8:32 AM

Better Naming

SureYeaah updated this revision to Diff 215574.Aug 16 2019, 6:06 AM

Fixed bug in getLocType

SureYeaah updated this revision to Diff 215615.Aug 16 2019, 9:02 AM

Fixed semicolon bug

kadircet added inline comments.Aug 20 2019, 3:23 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
68 ↗(On Diff #215615)

Could you move the definition of RootStmt to class definition? Since it is used in a lot more places than just this one.

68 ↗(On Diff #215615)

We only support extraction of RootStmts.

Can you also mention the reasoning/implications ?

71 ↗(On Diff #215615)

Can you make this and the computeEnclosingFunction{,Rng} free functions instead of static methods?

74 ↗(On Diff #215615)

haven't checked the implementation yet, but naming and the comments seem to be ambiguous.

Do you mean *selected* childrens of Parent, instead of *all*? Because IIUC, Source refers to the part of the function body that is selected, not all of it, right?

222 ↗(On Diff #215615)

comments for some of the fields like SemicolonPolicy would be nice.

also most of these looks like implementation details, consider moving internal ones to private

286 ↗(On Diff #215615)

could you give more details on what is collected and how?

360 ↗(On Diff #215615)

why not move this and 4 following functions into astvisitor?

Main themes are:

  • there are a few places we can reduce scope for the first patch: e.g. template support
  • internally, it'd be clearer to pass data around in structs and avoid complex classes unless the abstraction is important
  • high-level documentation would aid in understanding: for each concept/function: what is the goal, how does it fit into higher-level goals, are there any non-obvious reasons it's done this way
  • naming questions (this can solve some of the same problems as docs, but much more cheaply)
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
7 ↗(On Diff #215615)

this file will need an overview describing

  • the refactoring, and (briefly) major user-visible design decisions (e.g. params, return types, hoisting)
  • structure/approach/major parts
37 ↗(On Diff #215615)

this name is *also* pretty confusing because of SourceLocation, SourceRange etc classes.

Best option might just be to make up something distinctive, like Zone :-/

39 ↗(On Diff #215615)

nit: if you're going to document the values one-by-one, do it in a comment on each value instead

42 ↗(On Diff #215615)

nit: we don't use MACROCASE for enums in LLVM (I think). This of course makes namespacing more important, we should consider enum class here.

I'd probably suggest

enum class ZoneRelative { // or some name consistent with above
  Before,
  After,
  Inside,
  External
};
45 ↗(On Diff #215615)

This looks like basically a struct (mostly public data members) where all the logic is in the constructor.
This would be more obvious/more consistent with style elsewhere if it were literally a struct and the constructor was instead a function (which could return Expected<ExtractionSourceView> instead of setting flags on the result)

45 ↗(On Diff #215615)

nit: not sure view adds anything here. ExtractionSource or ExtractionZone?

50 ↗(On Diff #215615)

classify?

59 ↗(On Diff #215615)

when is this not Parent->Children.back()?

93 ↗(On Diff #215615)

I think you probably want to avoid extracting from lambdas and templates for now, too.

104 ↗(On Diff #215615)

nit: Please spell 'range' rather than 'rng' in function names, as it's not a standard abbreviation in clang[d]

107 ↗(On Diff #215615)

can we avoid adding template support in the first patch? It's complicated, and I don't think the code below handles it robustly.

For instance, this case will be extracted (despite being dependent), because the reference to T is not seen by the ast visitor, and the type of the decls that are seen are not dependent:

template <typename T>
int zero() { return std::vector<T>().size(); }
113 ↗(On Diff #215615)

It's not clear what this function does and why. Can you add a high-level comment?

124 ↗(On Diff #215615)

LLVM_FALLTHROUGH

(we support other host compilers: msvc, gcc)

144 ↗(On Diff #215615)

what needs to be fixed, and why is it important?

175 ↗(On Diff #215615)

what is this function doing, and why?
why is it separated from getParentOfRootStmts?

it seems like we're trying to determine that the selection consists of either a single completely-selected stmt, or a bunch of completely-selected statements with a common parent, and then we represent them as "all the children of a certain parent node".

Splitting the logic into two distant functions, and not documenting this intent, makes it hard to follow.

206 ↗(On Diff #215615)

as above, this seems like it would be clearer as a struct.

214 ↗(On Diff #215615)

I'd suggest capturing the type as a QualType instead of a string + const + ref flag

When types may need to be re-spelled, we'll need that extra information - the context needed to re-spell is available at render() time, not addParameter() time.

225 ↗(On Diff #215615)

We tend to avoid std::set except in the rare cases where we need both sorting and on-the-fly deduplication.

Here, it seems std::vector would be enough, and explicitly sorting an the end would make the intent clearer.

226 ↗(On Diff #215615)

Hoisting isn't supported, drop this?

235 ↗(On Diff #215615)

as previously discussed, there's not enough (conceptual or implementation) commonality between the call and the declaration, can we have separate renderDefinition() and renderCall() functions?

262 ↗(On Diff #215615)

these TODOs don't really describe the structural change that's needed here: we need a way to apply a bunch of edits to the function text (in terms of the original coordinates).

400 ↗(On Diff #215615)

I'm confused about the decls to host part - this isn't actually supported right? Take it out of the name?

Or are they detected in order to cause the refactoring to fail?

454 ↗(On Diff #215615)

you've got a return type, a comment, and a function name, and they all say the same thing. Can you explain (at a high level) what this is for?

clang-tools-extra/clangd/unittests/TweakTests.cpp
603 ↗(On Diff #215615)

the test helpers are really aimed at testing prepare/apply together rather than separately.

These EXPECT_AVAILABLE could be EXPECT_EQ(apply(), ...). If prepare() returns false it'll return "prepare failed" or some string like that.

618 ↗(On Diff #215615)

rather than checking in commented-out tests, I'd prefer to check in the tests verifying current incorrect behavior with a FIXME.
Commented out code tends to rot (or often not quite work in the first place), and it's not discoverable

(or just leave the FIXME by itself)

639 ↗(On Diff #215615)

maybe add a comment saying why not?

649 ↗(On Diff #215615)

this line does nothing?

651 ↗(On Diff #215615)

IIRC you can't use raw strings inside a macro (it's legal, but one of the supported compilers chokes on it)

654 ↗(On Diff #215615)

please avoid large tests that test lots of things, test them one-at-a-time instead. The intent is clearer, and it's clearer what failures mean.

683 ↗(On Diff #215615)

please write these out explicitly instead.
Among other things, the goal in the helpers was that assertion failures should preserve line numbers of examples

SureYeaah marked 33 inline comments as done.

Addressed Review comments

SureYeaah updated this revision to Diff 216805.Aug 23 2019, 4:26 AM
SureYeaah marked 4 inline comments as done.

Addressed more review comments

SureYeaah retitled this revision from [Clangd] Initial prototype version of ExtractFunction to [Clangd] First version of ExtractFunction.Aug 23 2019, 5:57 AM
SureYeaah edited the summary of this revision. (Show Details)
SureYeaah added inline comments.Aug 23 2019, 6:02 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
214 ↗(On Diff #215615)

Const will also depend on CapturedSourceInfo. So I'm passing QualType instead of the name as string.

360 ↗(On Diff #215615)

Because we would need to pass CapturedSourceInfo and ExtractionZone to the Visitor then?

SureYeaah marked an inline comment as done.Aug 23 2019, 6:07 AM
SureYeaah added inline comments.Aug 23 2019, 6:07 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
360 ↗(On Diff #215615)

Ignore.

SureYeaah updated this revision to Diff 216831.Aug 23 2019, 6:53 AM
SureYeaah marked 2 inline comments as done.
SureYeaah edited the summary of this revision. (Show Details)

Removed extra code

SureYeaah updated this revision to Diff 217108.Aug 26 2019, 3:45 AM

Fixed typo

kadircet added inline comments.Aug 26 2019, 5:16 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
29 ↗(On Diff #217108)

did you mean no return *type* ?

30 ↗(On Diff #217108)

don't understand what you meant here

31 ↗(On Diff #217108)

could you elaborate on that one?

You can ignore this comment if it is gonna be explained in the design doc(and you are planning to attach it before landing this revision)

42 ↗(On Diff #217108)

Tried hard to figure out what RAV is :D could you rather say RecursiveASTVisitor(RAV) ?

122 ↗(On Diff #217108)

IIRC, < operator in SourceLocations are merely for implementing hashing and doesn't really carry less than semantics. Could you rather use SM.isBeforeInSLocAddrSpace?

124 ↗(On Diff #217108)

same here

133 ↗(On Diff #217108)

Can you rather return a FunctionDecl ?

152 ↗(On Diff #217108)

maybe rather return an llvm::Optional/Expected ?

165 ↗(On Diff #217108)

I suppose this relies on the fact that "AST contains the nodes ordered by their begin location"? Could you add an assertion for that?

176 ↗(On Diff #217108)

what makes this fail-free ?

180 ↗(On Diff #217108)

This explains *what* the function is doing, but lacks the *reason why*

Could you explain the reason either in here or in the call site?

181 ↗(On Diff #217108)

hasOnlyRootStmtsAsChildren ?

215 ↗(On Diff #217108)

why ?

225 ↗(On Diff #217108)

why is this function returning a shared_ptr ?

332 ↗(On Diff #217108)

why don't we infer this const from QualType ?

347 ↗(On Diff #217108)

what does numbering mean ?

480 ↗(On Diff #217108)

nit: move this to top of the loop

519 ↗(On Diff #217108)

maybe return an llvm::Expected so that we can tell user about the reason why refactoring failed?

547 ↗(On Diff #217108)

why do you need a shared_ptr here?

sammccall added inline comments.Aug 26 2019, 5:37 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
46 ↗(On Diff #217108)

please fix or remove comment

88 ↗(On Diff #217108)

nit: "wrt to" -> wrt

104 ↗(On Diff #217108)

remove from struct and pass as a parameter?
This isn't really data

112 ↗(On Diff #217108)

document why this is important, or just inline it?

113 ↗(On Diff #217108)

don't bother handling the null parent case. When can that happen? We should avoid creating an ExtractionZone struct entirely then.

Also none of the callsites bother to check for null...

132 ↗(On Diff #217108)

nit: consistently one blank line between functions (in this file you alternate between 1/0)

157 ↗(On Diff #217108)

why are you doing this with a loop, isn't it just {first->begin,last->end}?

181 ↗(On Diff #217108)

nit: I think this would be clearer as
canBeRootStmt(const Node*) and write the callsite as llvm::any_of(CommonAnc->Children, canBeRootStmt). Up to you though

225 ↗(On Diff #217108)

avoid shared_ptr unless there's a strong reason. Optional<ExtractionZone> seems fine here?

257 ↗(On Diff #217108)

there's no need for the WithTypeAndQualifiers=false version, that's just "fn.name"

270 ↗(On Diff #217108)

this is just initializing public fields, drop the constructor?
(The callsite is clearer here if you initialize them by name)

280 ↗(On Diff #217108)

this is just push_back, inline into caller

285 ↗(On Diff #217108)

again, avoid sharing code between the complicated declaration case, and the trivial callsite case. It makes changes harder and obscures the hard case.

326 ↗(On Diff #217108)

use printType from AST.h?

(You'll want to drop qualifiers/refs before calling that, but it's not at all obvious from the function name here that they're dropped, so that should be at the callsite anyway)

346 ↗(On Diff #217108)

missing initializers on this and DeclIndex

347 ↗(On Diff #217108)

Giving a definition of "index" isn't useful here :-)
Rather prefer to say what it's the index of: is it the index of first reference, relative to the other decls?

352 ↗(On Diff #217108)

what is this constructor for?

375 ↗(On Diff #217108)

Again, the comment just echoes the code. There is some subtlety here, either explain it ("the index is the size of the map so far, so it increases for each decl found") or ignore it - this isn't too hard to work out.

410 ↗(On Diff #217108)

please don't add these NOLINT comments everywhere. It's too noisy and we don't do it elsewhere (agree we should silence clang-tidy somehow)

430 ↗(On Diff #217108)

computing and checking bounds for every node in the same function seems unneccesarily inefficient.

Why not override TraverseStmt to set a flag to true when the stmt is a RootStmt, call Base::TraverseStmt then set it to false again after?

455 ↗(On Diff #217108)

seems you could write this more directly - e.g. it's natural to check for hoisting here.

if (DeclaredIn == Inside && ReferencedInPostZone)
  return false; // needs to be hoisted, not supported
if (!ReferencedInZone)
  continue; // no need to pass as parameter, not referenced
if (DeclaredIn == Inside || DeclaredIn == OutsideFunc)
  continue; // no need to pass as parameter, still accessible
clang-tools-extra/clangd/unittests/TweakTests.cpp
611 ↗(On Diff #217108)

wait, what?
expressions that *aren't statements* shouldn't be extracted.
But what's wrong with this one?
(Many useful statements are expressions, such as function calls)

615 ↗(On Diff #217108)

please clarify this comment (specifically what needs to be fixed?) ideally add the broken test case with a FIXME to fix it

621 ↗(On Diff #217108)

add a more realistic example? I can easily imagine banning this case in the future (extraction of an empty block)

641 ↗(On Diff #217108)

please make the tests use reasonably conventional style - FOO -> Foo, one statement per line, etc

SureYeaah updated this revision to Diff 217619.Aug 28 2019, 6:17 AM
SureYeaah marked 45 inline comments as done.

Addressed review comments

SureYeaah added inline comments.Aug 28 2019, 6:18 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
165 ↗(On Diff #217108)

I've removed the loop, should I still add this?

181 ↗(On Diff #217108)

Replaced with isRootStmt

225 ↗(On Diff #217108)

And store a unique_ptr to the optional in ExtractFunction?

225 ↗(On Diff #217108)

Because ExtractFunction needs to store a pointer/reference to ExtractionZone somehow in prepare and access it in apply.

270 ↗(On Diff #217108)

ExtractionSemicolonPolicy doesn't have a default constructor.

332 ↗(On Diff #217108)

In a future patch we will want to use heuristics like has the variable been assigned, was it passed as a non-const reference, was its address taken, etc.

547 ↗(On Diff #217108)

getExtractionZone creates an Optional<ExtractionZone> which needs to persist from prepare to apply. Is there a better way to store a reference to the ExtractionZone instance inside ExtractFunction?

clang-tools-extra/clangd/unittests/TweakTests.cpp
611 ↗(On Diff #217108)

For now we don't extract if the selection is an expression. I've added a fixme to change that to sub-expressions only.

SureYeaah updated this revision to Diff 217620.Aug 28 2019, 6:19 AM

NFC: Whitespace formatting

sammccall accepted this revision.Aug 28 2019, 7:33 AM

I think we should go ahead and land this. The only points that I'd really like to see fixed is shared_ptr, mostly because it's a strong signal there's something complicated going on and I don't think (or hope) there is!

clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
119 ↗(On Diff #217620)

D66872 has the fix if you'd rather avoid these workarounds

146 ↗(On Diff #217620)

llvm::DenseSet<const Stmt*>

std::set is a pretty terrible data structure unless you really really need order.
(Lots of allocations, lots of pointer-chasing)

238 ↗(On Diff #217620)

this is also:

for (const SelectionTree::Node *N)
  ExtZone->RootStmts.insert(N->ASTNode.get<Stmt>());

which is shorter and (I think) less obfuscated

326 ↗(On Diff #217108)

Second half is not done: the suggestion is that it's really confusing where spellType is called that it drops qualifiers, so just I'd inline this into the callsite.

332 ↗(On Diff #217108)

I think the point is that the QualType in parameter could/should represent the *parameter* type, not the type of the variable being captured.
e.g. it could be const std::string& even if the original variable was just std::string.

This seems the more natural model (the struct is called Parameter, not CapturedVariable or Argument) but there may be reasons not to do this.

547 ↗(On Diff #217108)

shared ownership is less common and harder to reason about than exclusive ownership

Generally we prefer deciding where ownership should reside (T or Optional<T> or unique_ptr<T>), and where you should have an unowned reference (T& or T*).

In this case, findExtractionZone() should ideally return Optional<ExtractionZone>, after prepare() the ExtractFunction class should own it (as a Optional<ExtractionZone>), and apply should pass it around as a const ExtractionZone&.

If it turns out ExtractionZone isn't a movable type, then Optional<ExtractionZone> won't work and you'll need unique_ptr<ExtractionZone> instead. But shared_ptr shouldn't be needed regardless.

This revision is now accepted and ready to land.Aug 28 2019, 7:33 AM
kadircet marked an inline comment as done.Aug 28 2019, 7:38 AM
kadircet added inline comments.
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
146 ↗(On Diff #217620)

lifetime of this field looks complicated can you add some comments on how/when it is initialized and maybe enforce immutability ?

150 ↗(On Diff #217620)

it seems like you are rather using it to decide whether a statement is inside the zone or not?

Could you rather rename it to reflect that and add some comments?

169 ↗(On Diff #217620)

nit:

if (isa<CXXMethodDecl>(Func))
172 ↗(On Diff #217620)

nit:

if(isa<FunctionTemplateDecl>(Func))
346 ↗(On Diff #217620)

i think you mean index of the first reference to this decl ?

486 ↗(On Diff #217620)

nit isa<FunctionDecl> instead of dyn_cast

165 ↗(On Diff #217108)

no need

225 ↗(On Diff #217108)

Let's figure it out in the ExtractFunction this can still return an Optional<ExtractionZone> which you can convert into a pointer/reference later on if need be?

547 ↗(On Diff #217108)

Then why not just store Optional<ExtractionZone> ?

SureYeaah marked 14 inline comments as done.

Addressed review comments

clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
119 ↗(On Diff #217620)

Will remove this once D66872 is committed.

150 ↗(On Diff #217620)

if extracting

for(;;)
  int x = 0;

the DeclStmt is inside the zone but not a rootstmt.

172 ↗(On Diff #217620)

Using Func->isTemplated()

346 ↗(On Diff #217620)

It can be the index of declaration or the first reference. Since the visitor Visits Decls as well, we will encounter a reference before the Decl only if the Decl is outside the enclosing function.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 28 2019, 12:36 PM