This is an archive of the discontinued LLVM Phabricator instance.

[clangd] dummy variable extraction on a function scope
ClosedPublic

Authored by SureYeaah on Jun 25 2019, 8:35 AM.

Details

Summary
  • Added extraction to a dummy variable
  • using auto for the dummy variable type for now
  • Works on a function scope
  • Adding braces to create a compound statement not supported yet
  • added unit tests

Diff Detail

Repository
rL LLVM

Event Timeline

SureYeaah created this revision.Jun 25 2019, 8:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2019, 8:35 AM
SureYeaah retitled this revision from dummy variable extraction on a function scope to [clangd] dummy variable extraction on a function scope.Jun 25 2019, 8:37 AM
kadircet added inline comments.Jun 26 2019, 3:08 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
54 ↗(On Diff #206460)

I believe this class is rather used to check if any decl referenced in an expression was declared in a statement, rather then finding those declrefs.

So what about turning it inside out :D ? Something like:

bool isDeclaredIn(const DeclStmt *DS, const Expr *E, SourceManager &SM) {
  // Collects all DelcRefs in an AST node.
  class FindDeclRefsVisitor .... {
  public: 
    bool VisitDeclRefExpr(... *DRE) { DeclRefs.push_back(DRE); return true; }
    std::vector<DelcRefExpr*> DeclRefs;
  };
  FindDeclRefsVisitor X;
  X.TraverseStmt ...
  for(auto *DRE : X.DeclRefs) {
     ...
   }
   return ...;
}
80 ↗(On Diff #206460)

looks really nice but we already type all of the member fields and types in the code explicitly anyway, maybe simplify a bit with:

static bool needBraces(....) {
  auto IsNodeInside = [..](SourceRange Body) {
     return M.isPointWithin(...);
  };

 if(ForStmt *Stmt = llvm::dyn_cast_or_null<ForStmt>(Stm))
    return IsNodeInside(Stmt->getBody());
  ...
  ...
}
95 ↗(On Diff #206460)

I suppose currently this check is not doing what it claims to do(not checking for whether it is the only statement and/or if there are enclosing braces).

Maybe leave it out of this patch completely?

108 ↗(On Diff #206460)

I believe it would be better to have a more generic check that will ensure all of the delcs referenced in Exp are still visible in the context which we'll insert the new statement. WDYT?

Were there any reasons to make it specific for ForStmt ?

119 ↗(On Diff #206460)

could you add a few comments explaining the logic?

137 ↗(On Diff #206460)

what about a case like:

for(int i=0;;) {
  int z = [[f(i)*10]] + 5;
}

I believe it should be OK to extract in this case right ?

177 ↗(On Diff #206460)

maybe extract these into a function that will take VarName, Stm and Exp as parameters and will return an Expected<Replacement>. So that you can easily change "dummy" part later on if we get another way to invoke this functionality.

181 ↗(On Diff #206460)

same as above

SureYeaah marked 12 inline comments as done.Jun 26 2019, 6:32 AM
SureYeaah added inline comments.
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
54 ↗(On Diff #206460)

Aah, I didn't know you could declare a class inside a function. Nice.

95 ↗(On Diff #206460)

Right now, we check whether we will need braces and if so, we don't provide an extraction option. Although we don't insert the braces, we still need to check if the extraction would require braces.

The "body" of an if, for, while etc. will either be a CompoundStmt in which case there are already braces or not in which case there are no braces. So if while going up the AST parents, we encounter a CompoundStmt, we stop there. Otherwise, if we encounter an if/for/while etc. Stmt, we we have 2 cases - Either we are in the body (e.g. do <body> while(...); ) or the condition part (e.g. if(...) or for(...)). So we check which of these two cases it is and accordingly decide if we need braces i.e. the body case.

So we don't need to check whether it's the only statement. Does that make sense?

108 ↗(On Diff #206460)

The reason why it's specific for ForStmt (at least for now) is because only in ForStmt we can have declarations as well as other expressions in the for(...) part. For all other cases, we don't have such a scenario. WDYT?

137 ↗(On Diff #206460)

This check is only for extraction from the for(<Init>; <Cond>; <Inc>) and not the body of the for.

SureYeaah updated this revision to Diff 206663.Jun 26 2019, 7:22 AM
SureYeaah marked 4 inline comments as done.

Refactored code

  • Refactored code as pointed by kadircet
  • Fixed crash for if statements without an else clause
SureYeaah updated this revision to Diff 206674.Jun 26 2019, 8:03 AM

Replaced auto with actual type for clarity

SureYeaah updated this revision to Diff 206676.Jun 26 2019, 8:06 AM

Fixed comment

kadircet added inline comments.Jun 26 2019, 8:12 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
122 ↗(On Diff #206663)

s/above/just before

158 ↗(On Diff #206663)

maybe move this into if (!ParStmt) case ?

186 ↗(On Diff #206663)

maybe add a todo for replacing auto with explicit type?

95 ↗(On Diff #206460)

ah I see, I was looking at this code in isolation so didn't notice the check for hitting a compoundstmt first. it makes sense now.

108 ↗(On Diff #206460)

it is also possible to have a similar issue with

int a = 5, b = [[a + 2]];
SureYeaah updated this revision to Diff 206688.Jun 26 2019, 8:42 AM
SureYeaah marked 4 inline comments as done.

Added check for multi variable initialization

SureYeaah marked an inline comment as done.Jun 26 2019, 2:14 PM

Please forgive the mix of high-level and low-level comments here.
Happy to discuss further of course!

clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
55 ↗(On Diff #206688)

nit: why take a DeclStmt instead of just the Decl here?

55 ↗(On Diff #206688)

This function name is confusing - it's unclear which of DeclStmt vs Exp is declared in which, and that it's subexpressions of Exp that are being checked

maybe bool referencesLocalDecls? and rename Decl -> Scope, as it provides the meaning of "local"

56 ↗(On Diff #206688)

nit: SourceManager is almost universally SM (or something more verbose)

62 ↗(On Diff #206688)

nit: seems a little clearer to collect the referenced decls rather than the referencing exprs.
(It will also generalize better if we want to do things like notice referenced local classes, which we don't yet)

73 ↗(On Diff #206688)

nit: "ValueDecl" is not usually a very interesting class conceptually, I'd just call this DeclLoc

73 ↗(On Diff #206688)

throughout you're using auto a bit too heavily - usually doesn't aid readability when the type is a single word like SourceLocation here and isn't named in the expression

130 ↗(On Diff #206688)

Another general style point: LLVM does use short variable names (like N) sometimes when the type unambiguously describes the role of the variable and the scope is small.

That's not the case here: a SelectionTree::Node could be almost anything so I'd suggest TargetExpr or so.

Conversely I think Parent or NewParent would be clearer than ParStmt which repeats more info from the type.

132 ↗(On Diff #206688)

I might be missing something, but as I understand:

  • you're proposing moving the expr N out of the scope ParStmt
  • you're testing whether any subexpression of N references anything declared in certain substmts of ParStmt
  • because of "certain substmts" you're only supporting this for some types of ParStmt

why can't we just check for anything declared inside ParStmt, and make this fully general (applied for all stmts)?

157 ↗(On Diff #206688)

this needs a much more descriptive name.
maybe getInsertionPoint()?

159 ↗(On Diff #206688)

auto* (or const auto*) - we typically spell out the pointer-ness

160 ↗(On Diff #206688)

please make this a for loop, you have an initializer, condition and increment

162 ↗(On Diff #206688)

so there's a few cases for CurN here:

  • Expr
  • non-Expr Stmts
  • Decl
  • other weird things (TypeLoc, template parameter list, ...)

As a first cut I think we should be a bit more systematic and conservative about the cases where we allow a variable to be inserted far away from the extraction.

I'd suggest that for Expr your "default" behavior is to keep traversing up, for Decl you probably want to stop and give up and similarly for the "everything else" case you want to give up. Maybe there are cases where these are OK/important, but we should probably whitelist those.

For non-Expr Stmts, you're currently blacklisting CaseStmt for traversal through, I think this should be a whitelist instead, just allowing loops for now.

I think this obviates the need for needBraces and isNodeInside for now, unless I'm missing something.

173 ↗(On Diff #206688)

here you're traversing the whole Expr to find the referenced decls at each iteration of this loop.
Can you analyse the expr just once, and reuse the list of decls?

186 ↗(On Diff #206688)

no need to get the actual source code: length is SM.getFileOffset(ExpRng->End) - SM.getFileOffset(ExpRng->Begin)

186 ↗(On Diff #206688)

I think you should maybe handle the case where ExpRng is none (e.g. because something is unexpectedly inside a macro).

If you're sure this can't happen, add an assert with a message.

202 ↗(On Diff #206688)

also may need to use & or && as appropriate. FIXME is fine for this too.

208 ↗(On Diff #206688)

fix for this has landed

221 ↗(On Diff #206688)

this is clever and not really a common idiom in LLVM code, so a bit hard to decipher

usually prefer Stm = ...; return Stm != nullptr;

236 ↗(On Diff #206688)

I'm not sure "dummy" is very meaningful to the user here.
"Extract to variable" or "Extract subexpression to variable"?

SureYeaah marked 14 inline comments as done.Jun 27 2019, 10:53 AM
SureYeaah added inline comments.
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
173 ↗(On Diff #206688)

I thought it'll only be analyzed once but actually it's at max twice in the case where the selected expression is a part of a DeclStmt in the <Init> of a ForStmt.

SureYeaah marked 4 inline comments as done.Jun 28 2019, 9:25 AM
SureYeaah updated this revision to Diff 207084.Jun 28 2019, 9:57 AM

[Clangd] Refactored code

  • Created new class Extract to store information about the expression being extracted.
  • Doesn't fix all of previous comments

Looking for comments on the new class stucture

SureYeaah marked an inline comment as done.Jun 28 2019, 10:34 AM
kadircet added inline comments.Jul 1 2019, 1:58 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
34 ↗(On Diff #207084)

most of the methods seem to be taking a SM, why not make it a class field ?

47 ↗(On Diff #207084)

maybe some comments about the fields/methods below?

71 ↗(On Diff #207084)

this method doesn't use class state, maybe move it out-of-class to a function?

163 ↗(On Diff #207084)

store a unique_ptr instead

186 ↗(On Diff #207084)

this seems to be using a lot of class state(and some functions are exposed merely for use of this function). why not make this one a class method instead?

Comments re the Extract class.
It seems OK to me, i'm not really sure whether it's an improvement or not after the pieces (needsBraces, checForStmt etc) are removed. Up to you.

(As the previous comments would reduce the scope of the patch, it'd be nice to address those before reviewing further).

clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
33 ↗(On Diff #207084)

drop "class to store" from the comment

33 ↗(On Diff #207084)

this class needs to have a clearer name and purpose.

ExtractionContext is marginally better, maybe there's a more specific unifying concept to be found.

41 ↗(On Diff #207084)

Many of these names should be a little more specific about their semantics.
"Node" is the type, and isn't specific enough to know which node it is. getExprNode()?

50 ↗(On Diff #207084)

probably call this computeReferencedDecls()? (or as Kadir noted, pull out of the class)

54 ↗(On Diff #207084)

please inline trivial functions like this, getExpr(), getNode()

SureYeaah marked 13 inline comments as done.Jul 1 2019, 5:11 AM
SureYeaah added inline comments.
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
71 ↗(On Diff #207084)

It uses the expr

sammccall added inline comments.Jul 1 2019, 5:26 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
71 ↗(On Diff #207084)

Nevertheless, as it just uses one param, has an obvious standalone meaning, and is called from the constructor, it does seem clearer to me as a helper outside the class.

SureYeaah marked 2 inline comments as done.Jul 1 2019, 8:31 AM
SureYeaah updated this revision to Diff 207522.Jul 2 2019, 4:55 AM

Removed check for braces and fixed code for finding insertionpoint

sammccall added inline comments.Jul 4 2019, 5:01 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
42 ↗(On Diff #207522)

nit: StringRef almost always for params you don't mutate

46 ↗(On Diff #207522)

define trivial (and even simple) methods inline

65 ↗(On Diff #207522)

nit: can you move the tweak to the end, to avoid alternating between the inner logic implementation and the outer interface?

88 ↗(On Diff #207522)

I'd suggest naming this something more after the intent than after the structure, as we may want to refine it over time, e.g. isTooTrivialToExtract or isEligibleToExtract

90 ↗(On Diff #207522)

Extracting just a declrefexpr (and replacing it with another declrefexpr) doesn't seem useful. Any reason to only do this for functions, rather than all declrefexprs?

90 ↗(On Diff #207522)

a syntactically similar case is MemberExpr where isImplicitAccess() is true. (This is referring to a method of the current class, without qualification)

91 ↗(On Diff #207522)

you may want to check whether this does the right thing for a function template

120 ↗(On Diff #207522)

remove debug dumping

122 ↗(On Diff #207522)

Can we be a bit more explicit about the control flow and boolean initialization here?
e.g. if (isAFunctionRef(Expr)) return; if (InsertionPoint = computeInsertionPoint()) Extractable = true;

135 ↗(On Diff #207522)

The name doesn't really match the behavior here - there are lots of other reasons we couldn't insert before a point.
this sounds more like ExtractionContext::exprIsValidOutside or something?

137 ↗(On Diff #207522)

this behavior is not at all obvious at the callsite. I'd suggest making the caller check for null, and taking a reference.

158 ↗(On Diff #207522)

The scope of this patch is growing in ways that don't seem to be related to the review.

Can you revert the switch-related changes and move them to another patch?

165 ↗(On Diff #207522)

this is the Parent, not an Ancestor, I think?

173 ↗(On Diff #207522)

this describes what the code is doing, but that's fairly obvious from the code. It would be useful to say why instead.
e.g. "We only allow insertion inside a CompoundStmt. It must be written in the file, not a macro."

178 ↗(On Diff #207522)

nit: please use braces consistently within an else-if chain

181 ↗(On Diff #207522)

I think this is also new in this revision.
Can we move it to a followup patch? I'm not sure exactly what you're trying to do, but suspect there's a better way/place for it.

183 ↗(On Diff #207522)

you're still traversing upwards through arbitrary AST nodes with a blacklist - as discussed previously this doesn't seem safe

219 ↗(On Diff #207522)

this FIXME probably belongs on the isEligibleToExtract or similar function

220 ↗(On Diff #207522)

it's not clear what the bug or fix is for this

SureYeaah updated this revision to Diff 208174.Jul 5 2019, 7:09 AM
SureYeaah marked 21 inline comments as done.

Added whitelist for computeInsertionPoint

SureYeaah added inline comments.Jul 5 2019, 9:55 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
90 ↗(On Diff #207522)

Extracting

int a = [[f]]();

yields

auto dummy = f;
int a = dummy();

I thought this should be prevented. But now I see there's no need to do that.
I'll add a triviality check if needed in the next patch.

sammccall accepted this revision.Jul 8 2019, 9:48 AM

Awesome! Do you have commit access?

clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
37 ↗(On Diff #208174)

nit: drop comments that just echo the name, like this one

82 ↗(On Diff #208174)

why a delete expression? if this generalises to "expression of type void" then you can check for that directly

101 ↗(On Diff #208174)

nit: remove extra parens

107 ↗(On Diff #208174)

nit: a variable reference out of scope

134 ↗(On Diff #208174)

why?

137 ↗(On Diff #208174)

can you add a comment explaining approximately why many (most) stmts are allowed but some others cause problems?

187 ↗(On Diff #208174)

nit: trailing space? should this be a newline?

90 ↗(On Diff #207522)

sure - I just mean int a = b; -> auto dummy = b; int a = dummy; is equally pointless, so I wasn't sure about the function check. But fine to defer.

This revision is now accepted and ready to land.Jul 8 2019, 9:48 AM
This revision was automatically updated to reflect the committed changes.
SureYeaah marked 8 inline comments as done.
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2019, 3:12 AM
SureYeaah added inline comments.Jul 9 2019, 3:13 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
187 ↗(On Diff #208174)

Making it a newline makes writing tests harder.

thakis added a subscriber: thakis.Jul 9 2019, 6:57 AM
thakis added inline comments.
clang-tools-extra/trunk/clangd/refactor/tweaks/CMakeLists.txt
20

(nit: keep in alphabetical order)