This is an archive of the discontinued LLVM Phabricator instance.

[Clangd] First version of ExtractFunction
ClosedPublic

Authored by SureYeaah on Jul 31 2019, 10:43 AM.

Details

Summary
  • Only works for extraction from free functions
  • Basic analysis of the code being extracted.
  • Extract to void function
  • Bail out if extracting a return, continue or break.
  • Doesn't hoist decls yet

Event Timeline

SureYeaah created this revision.Jul 31 2019, 10:43 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 31 2019, 10:43 AM

I guess you're looking for design review at this point.

But the most important part of that is the documentation of the high-level structure, which is entirely missing. I can't really infer it from the code because most of the design is (presumably) not implemented at this point :-)

clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
37

I think "range" or "region" may be a clearer name here than "target".
(In extract variable, target was used to refer to the destination of extraction)

111

what is the fix you're waiting for?

156

These are important domain structures, can we move them out of the RecursiveASTVisitor? (which should be mostly restricted to managing state of the traversal itself)

157

You've called this "reference", but I think it's actually a "referenceddecl" - you don't store one of these for each reference do you?

190

why does D need to be a vardecl?

191

please don't add messages to asserts that just echo the code. In order of preference:

  • take a reference so it clearly can't be null
  • add a message explaining at a *high level* what variant is violated
  • add the assertion with no message
198

why is this handled specially, instead of being part of the general case of "referring to a decl not visible in the target scope"?

(in particular, if the context is forward-declared elsewhere, what's the problem?

204

DRE is just one of the cases where we want to handle names.

I think it'll be easier to handle more cases by changing this to accept a Decl* and moving the DRE->getDecl() to the caller

208

(why beginloc rather than loc, throughout?)

212

this line seems very suspect, what are you trying to do and why?

215

can't we skip the indirection here and just have something like Ref.markOccurrence(DRE->getBeginLoc()), and avoid dealing with the enum here at all?

SureYeaah marked 2 inline comments as done.Aug 2 2019, 6:58 AM

I guess you're looking for design review at this point.

But the most important part of that is the documentation of the high-level structure, which is entirely missing. I can't really infer it from the code because most of the design is (presumably) not implemented at this point :-)

Aaah. Yes. Sorry. Makes sense.

clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
111

The semicolon fix that has already landed now.

212

This just checks whether (DeclLocType, DRELocType) is either (PRETARGET, TARGET) or (TARGET, POSTTARGET). Could explicitly write it if needed.

sammccall added inline comments.Aug 2 2019, 7:06 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
212

Definitely needed, apart from the lack of clarity the current version has UB sometimes (DeclLocType + 1 might not be a valid value)

But by "why" i mostly mean - this looks like the place for recording the location, not the place for deciding not to record the location because some algorithm later may not need it. Seems a lot simpler to record it unconditionally and worry about it later.

SureYeaah updated this revision to Diff 215077.Aug 14 2019, 4:00 AM
SureYeaah marked 11 inline comments as done.

Refactored design and added unit tests.

SureYeaah updated this revision to Diff 215089.Aug 14 2019, 5:37 AM

Removed unrelated changes

SureYeaah updated this revision to Diff 215098.Aug 14 2019, 5:57 AM

Removed debug info

SureYeaah updated this revision to Diff 215131.Aug 14 2019, 8:32 AM

Better Naming

SureYeaah updated this revision to Diff 215574.Aug 16 2019, 6:06 AM

Fixed bug in getLocType

SureYeaah updated this revision to Diff 215615.Aug 16 2019, 9:02 AM

Fixed semicolon bug

kadircet added inline comments.Aug 20 2019, 3:23 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
69

Could you move the definition of RootStmt to class definition? Since it is used in a lot more places than just this one.

69

We only support extraction of RootStmts.

Can you also mention the reasoning/implications ?

72

Can you make this and the computeEnclosingFunction{,Rng} free functions instead of static methods?

75

haven't checked the implementation yet, but naming and the comments seem to be ambiguous.

Do you mean *selected* childrens of Parent, instead of *all*? Because IIUC, Source refers to the part of the function body that is selected, not all of it, right?

223

comments for some of the fields like SemicolonPolicy would be nice.

also most of these looks like implementation details, consider moving internal ones to private

287

could you give more details on what is collected and how?

361

why not move this and 4 following functions into astvisitor?

Main themes are:

  • there are a few places we can reduce scope for the first patch: e.g. template support
  • internally, it'd be clearer to pass data around in structs and avoid complex classes unless the abstraction is important
  • high-level documentation would aid in understanding: for each concept/function: what is the goal, how does it fit into higher-level goals, are there any non-obvious reasons it's done this way
  • naming questions (this can solve some of the same problems as docs, but much more cheaply)
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
8

this file will need an overview describing

  • the refactoring, and (briefly) major user-visible design decisions (e.g. params, return types, hoisting)
  • structure/approach/major parts
38

this name is *also* pretty confusing because of SourceLocation, SourceRange etc classes.

Best option might just be to make up something distinctive, like Zone :-/

40

nit: if you're going to document the values one-by-one, do it in a comment on each value instead

43

nit: we don't use MACROCASE for enums in LLVM (I think). This of course makes namespacing more important, we should consider enum class here.

I'd probably suggest

enum class ZoneRelative { // or some name consistent with above
  Before,
  After,
  Inside,
  External
};
46

This looks like basically a struct (mostly public data members) where all the logic is in the constructor.
This would be more obvious/more consistent with style elsewhere if it were literally a struct and the constructor was instead a function (which could return Expected<ExtractionSourceView> instead of setting flags on the result)

46

nit: not sure view adds anything here. ExtractionSource or ExtractionZone?

51

classify?

60

when is this not Parent->Children.back()?

94

I think you probably want to avoid extracting from lambdas and templates for now, too.

105

nit: Please spell 'range' rather than 'rng' in function names, as it's not a standard abbreviation in clang[d]

108

can we avoid adding template support in the first patch? It's complicated, and I don't think the code below handles it robustly.

For instance, this case will be extracted (despite being dependent), because the reference to T is not seen by the ast visitor, and the type of the decls that are seen are not dependent:

template <typename T>
int zero() { return std::vector<T>().size(); }
114

It's not clear what this function does and why. Can you add a high-level comment?

125

LLVM_FALLTHROUGH

(we support other host compilers: msvc, gcc)

145

what needs to be fixed, and why is it important?

176

what is this function doing, and why?
why is it separated from getParentOfRootStmts?

it seems like we're trying to determine that the selection consists of either a single completely-selected stmt, or a bunch of completely-selected statements with a common parent, and then we represent them as "all the children of a certain parent node".

Splitting the logic into two distant functions, and not documenting this intent, makes it hard to follow.

207

as above, this seems like it would be clearer as a struct.

215

I'd suggest capturing the type as a QualType instead of a string + const + ref flag

When types may need to be re-spelled, we'll need that extra information - the context needed to re-spell is available at render() time, not addParameter() time.

226

We tend to avoid std::set except in the rare cases where we need both sorting and on-the-fly deduplication.

Here, it seems std::vector would be enough, and explicitly sorting an the end would make the intent clearer.

227

Hoisting isn't supported, drop this?

236

as previously discussed, there's not enough (conceptual or implementation) commonality between the call and the declaration, can we have separate renderDefinition() and renderCall() functions?

263

these TODOs don't really describe the structural change that's needed here: we need a way to apply a bunch of edits to the function text (in terms of the original coordinates).

401

I'm confused about the decls to host part - this isn't actually supported right? Take it out of the name?

Or are they detected in order to cause the refactoring to fail?

455

you've got a return type, a comment, and a function name, and they all say the same thing. Can you explain (at a high level) what this is for?

clang-tools-extra/clangd/unittests/TweakTests.cpp
603

the test helpers are really aimed at testing prepare/apply together rather than separately.

These EXPECT_AVAILABLE could be EXPECT_EQ(apply(), ...). If prepare() returns false it'll return "prepare failed" or some string like that.

618

rather than checking in commented-out tests, I'd prefer to check in the tests verifying current incorrect behavior with a FIXME.
Commented out code tends to rot (or often not quite work in the first place), and it's not discoverable

(or just leave the FIXME by itself)

639

maybe add a comment saying why not?

649

this line does nothing?

651

IIRC you can't use raw strings inside a macro (it's legal, but one of the supported compilers chokes on it)

654

please avoid large tests that test lots of things, test them one-at-a-time instead. The intent is clearer, and it's clearer what failures mean.

683

please write these out explicitly instead.
Among other things, the goal in the helpers was that assertion failures should preserve line numbers of examples

SureYeaah marked 33 inline comments as done.

Addressed Review comments

SureYeaah updated this revision to Diff 216805.Aug 23 2019, 4:26 AM
SureYeaah marked 4 inline comments as done.

Addressed more review comments

SureYeaah retitled this revision from [Clangd] Initial prototype version of ExtractFunction to [Clangd] First version of ExtractFunction.Aug 23 2019, 5:57 AM
SureYeaah edited the summary of this revision. (Show Details)
SureYeaah added inline comments.Aug 23 2019, 6:02 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
215

Const will also depend on CapturedSourceInfo. So I'm passing QualType instead of the name as string.

361

Because we would need to pass CapturedSourceInfo and ExtractionZone to the Visitor then?

SureYeaah marked an inline comment as done.Aug 23 2019, 6:07 AM
SureYeaah added inline comments.Aug 23 2019, 6:07 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
361

Ignore.

SureYeaah updated this revision to Diff 216831.Aug 23 2019, 6:53 AM
SureYeaah marked 2 inline comments as done.
SureYeaah edited the summary of this revision. (Show Details)

Removed extra code

SureYeaah updated this revision to Diff 217108.Aug 26 2019, 3:45 AM

Fixed typo

kadircet added inline comments.Aug 26 2019, 5:16 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
30

did you mean no return *type* ?

31

don't understand what you meant here

32

could you elaborate on that one?

You can ignore this comment if it is gonna be explained in the design doc(and you are planning to attach it before landing this revision)

43

Tried hard to figure out what RAV is :D could you rather say RecursiveASTVisitor(RAV) ?

123

IIRC, < operator in SourceLocations are merely for implementing hashing and doesn't really carry less than semantics. Could you rather use SM.isBeforeInSLocAddrSpace?

125

same here

134

Can you rather return a FunctionDecl ?

153

maybe rather return an llvm::Optional/Expected ?

166

I suppose this relies on the fact that "AST contains the nodes ordered by their begin location"? Could you add an assertion for that?

177

what makes this fail-free ?

181

This explains *what* the function is doing, but lacks the *reason why*

Could you explain the reason either in here or in the call site?

182

hasOnlyRootStmtsAsChildren ?

216

why ?

226

why is this function returning a shared_ptr ?

333

why don't we infer this const from QualType ?

348

what does numbering mean ?

481

nit: move this to top of the loop

520

maybe return an llvm::Expected so that we can tell user about the reason why refactoring failed?

548

why do you need a shared_ptr here?

sammccall added inline comments.Aug 26 2019, 5:37 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
47

please fix or remove comment

89

nit: "wrt to" -> wrt

105

remove from struct and pass as a parameter?
This isn't really data

113

document why this is important, or just inline it?

114

don't bother handling the null parent case. When can that happen? We should avoid creating an ExtractionZone struct entirely then.

Also none of the callsites bother to check for null...

133

nit: consistently one blank line between functions (in this file you alternate between 1/0)

158

why are you doing this with a loop, isn't it just {first->begin,last->end}?

182

nit: I think this would be clearer as
canBeRootStmt(const Node*) and write the callsite as llvm::any_of(CommonAnc->Children, canBeRootStmt). Up to you though

226

avoid shared_ptr unless there's a strong reason. Optional<ExtractionZone> seems fine here?

258

there's no need for the WithTypeAndQualifiers=false version, that's just "fn.name"

271

this is just initializing public fields, drop the constructor?
(The callsite is clearer here if you initialize them by name)

281

this is just push_back, inline into caller

286

again, avoid sharing code between the complicated declaration case, and the trivial callsite case. It makes changes harder and obscures the hard case.

327

use printType from AST.h?

(You'll want to drop qualifiers/refs before calling that, but it's not at all obvious from the function name here that they're dropped, so that should be at the callsite anyway)

347

missing initializers on this and DeclIndex

348

Giving a definition of "index" isn't useful here :-)
Rather prefer to say what it's the index of: is it the index of first reference, relative to the other decls?

353

what is this constructor for?

376

Again, the comment just echoes the code. There is some subtlety here, either explain it ("the index is the size of the map so far, so it increases for each decl found") or ignore it - this isn't too hard to work out.

411

please don't add these NOLINT comments everywhere. It's too noisy and we don't do it elsewhere (agree we should silence clang-tidy somehow)

431

computing and checking bounds for every node in the same function seems unneccesarily inefficient.

Why not override TraverseStmt to set a flag to true when the stmt is a RootStmt, call Base::TraverseStmt then set it to false again after?

456

seems you could write this more directly - e.g. it's natural to check for hoisting here.

if (DeclaredIn == Inside && ReferencedInPostZone)
  return false; // needs to be hoisted, not supported
if (!ReferencedInZone)
  continue; // no need to pass as parameter, not referenced
if (DeclaredIn == Inside || DeclaredIn == OutsideFunc)
  continue; // no need to pass as parameter, still accessible
clang-tools-extra/clangd/unittests/TweakTests.cpp
611

wait, what?
expressions that *aren't statements* shouldn't be extracted.
But what's wrong with this one?
(Many useful statements are expressions, such as function calls)

615

please clarify this comment (specifically what needs to be fixed?) ideally add the broken test case with a FIXME to fix it

621

add a more realistic example? I can easily imagine banning this case in the future (extraction of an empty block)

641

please make the tests use reasonably conventional style - FOO -> Foo, one statement per line, etc

SureYeaah updated this revision to Diff 217619.Aug 28 2019, 6:17 AM
SureYeaah marked 45 inline comments as done.

Addressed review comments

SureYeaah added inline comments.Aug 28 2019, 6:18 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
166

I've removed the loop, should I still add this?

182

Replaced with isRootStmt

226

And store a unique_ptr to the optional in ExtractFunction?

226

Because ExtractFunction needs to store a pointer/reference to ExtractionZone somehow in prepare and access it in apply.

271

ExtractionSemicolonPolicy doesn't have a default constructor.

333

In a future patch we will want to use heuristics like has the variable been assigned, was it passed as a non-const reference, was its address taken, etc.

548

getExtractionZone creates an Optional<ExtractionZone> which needs to persist from prepare to apply. Is there a better way to store a reference to the ExtractionZone instance inside ExtractFunction?

clang-tools-extra/clangd/unittests/TweakTests.cpp
611

For now we don't extract if the selection is an expression. I've added a fixme to change that to sub-expressions only.

SureYeaah updated this revision to Diff 217620.Aug 28 2019, 6:19 AM

NFC: Whitespace formatting

sammccall accepted this revision.Aug 28 2019, 7:33 AM

I think we should go ahead and land this. The only points that I'd really like to see fixed is shared_ptr, mostly because it's a strong signal there's something complicated going on and I don't think (or hope) there is!

clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
120

D66872 has the fix if you'd rather avoid these workarounds

147

llvm::DenseSet<const Stmt*>

std::set is a pretty terrible data structure unless you really really need order.
(Lots of allocations, lots of pointer-chasing)

239

this is also:

for (const SelectionTree::Node *N)
  ExtZone->RootStmts.insert(N->ASTNode.get<Stmt>());

which is shorter and (I think) less obfuscated

327

Second half is not done: the suggestion is that it's really confusing where spellType is called that it drops qualifiers, so just I'd inline this into the callsite.

333

I think the point is that the QualType in parameter could/should represent the *parameter* type, not the type of the variable being captured.
e.g. it could be const std::string& even if the original variable was just std::string.

This seems the more natural model (the struct is called Parameter, not CapturedVariable or Argument) but there may be reasons not to do this.

548

shared ownership is less common and harder to reason about than exclusive ownership

Generally we prefer deciding where ownership should reside (T or Optional<T> or unique_ptr<T>), and where you should have an unowned reference (T& or T*).

In this case, findExtractionZone() should ideally return Optional<ExtractionZone>, after prepare() the ExtractFunction class should own it (as a Optional<ExtractionZone>), and apply should pass it around as a const ExtractionZone&.

If it turns out ExtractionZone isn't a movable type, then Optional<ExtractionZone> won't work and you'll need unique_ptr<ExtractionZone> instead. But shared_ptr shouldn't be needed regardless.

This revision is now accepted and ready to land.Aug 28 2019, 7:33 AM
kadircet marked an inline comment as done.Aug 28 2019, 7:38 AM
kadircet added inline comments.
clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
147

lifetime of this field looks complicated can you add some comments on how/when it is initialized and maybe enforce immutability ?

151

it seems like you are rather using it to decide whether a statement is inside the zone or not?

Could you rather rename it to reflect that and add some comments?

166

no need

170

nit:

if (isa<CXXMethodDecl>(Func))
173

nit:

if(isa<FunctionTemplateDecl>(Func))
226

Let's figure it out in the ExtractFunction this can still return an Optional<ExtractionZone> which you can convert into a pointer/reference later on if need be?

347

i think you mean index of the first reference to this decl ?

487

nit isa<FunctionDecl> instead of dyn_cast

548

Then why not just store Optional<ExtractionZone> ?

SureYeaah marked 14 inline comments as done.

Addressed review comments

clang-tools-extra/clangd/refactor/tweaks/ExtractFunction.cpp
120

Will remove this once D66872 is committed.

151

if extracting

for(;;)
  int x = 0;

the DeclStmt is inside the zone but not a rootstmt.

173

Using Func->isTemplated()

347

It can be the index of declaration or the first reference. Since the visitor Visits Decls as well, we will encounter a reference before the Decl only if the Decl is outside the enclosing function.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptAug 28 2019, 12:36 PM