This is an archive of the discontinued LLVM Phabricator instance.

[clangd] dummy variable extraction on a function scope
ClosedPublic

Authored by SureYeaah on Jun 25 2019, 8:35 AM.

Details

Summary
  • Added extraction to a dummy variable
  • using auto for the dummy variable type for now
  • Works on a function scope
  • Adding braces to create a compound statement not supported yet
  • added unit tests

Event Timeline

SureYeaah created this revision.Jun 25 2019, 8:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 25 2019, 8:35 AM
SureYeaah retitled this revision from dummy variable extraction on a function scope to [clangd] dummy variable extraction on a function scope.Jun 25 2019, 8:37 AM
kadircet added inline comments.Jun 26 2019, 3:08 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
55

I believe this class is rather used to check if any decl referenced in an expression was declared in a statement, rather then finding those declrefs.

So what about turning it inside out :D ? Something like:

bool isDeclaredIn(const DeclStmt *DS, const Expr *E, SourceManager &SM) {
  // Collects all DelcRefs in an AST node.
  class FindDeclRefsVisitor .... {
  public: 
    bool VisitDeclRefExpr(... *DRE) { DeclRefs.push_back(DRE); return true; }
    std::vector<DelcRefExpr*> DeclRefs;
  };
  FindDeclRefsVisitor X;
  X.TraverseStmt ...
  for(auto *DRE : X.DeclRefs) {
     ...
   }
   return ...;
}
81

looks really nice but we already type all of the member fields and types in the code explicitly anyway, maybe simplify a bit with:

static bool needBraces(....) {
  auto IsNodeInside = [..](SourceRange Body) {
     return M.isPointWithin(...);
  };

 if(ForStmt *Stmt = llvm::dyn_cast_or_null<ForStmt>(Stm))
    return IsNodeInside(Stmt->getBody());
  ...
  ...
}
96

I suppose currently this check is not doing what it claims to do(not checking for whether it is the only statement and/or if there are enclosing braces).

Maybe leave it out of this patch completely?

109

I believe it would be better to have a more generic check that will ensure all of the delcs referenced in Exp are still visible in the context which we'll insert the new statement. WDYT?

Were there any reasons to make it specific for ForStmt ?

120

could you add a few comments explaining the logic?

138

what about a case like:

for(int i=0;;) {
  int z = [[f(i)*10]] + 5;
}

I believe it should be OK to extract in this case right ?

178

maybe extract these into a function that will take VarName, Stm and Exp as parameters and will return an Expected<Replacement>. So that you can easily change "dummy" part later on if we get another way to invoke this functionality.

182

same as above

SureYeaah marked 12 inline comments as done.Jun 26 2019, 6:32 AM
SureYeaah added inline comments.
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
55

Aah, I didn't know you could declare a class inside a function. Nice.

96

Right now, we check whether we will need braces and if so, we don't provide an extraction option. Although we don't insert the braces, we still need to check if the extraction would require braces.

The "body" of an if, for, while etc. will either be a CompoundStmt in which case there are already braces or not in which case there are no braces. So if while going up the AST parents, we encounter a CompoundStmt, we stop there. Otherwise, if we encounter an if/for/while etc. Stmt, we we have 2 cases - Either we are in the body (e.g. do <body> while(...); ) or the condition part (e.g. if(...) or for(...)). So we check which of these two cases it is and accordingly decide if we need braces i.e. the body case.

So we don't need to check whether it's the only statement. Does that make sense?

109

The reason why it's specific for ForStmt (at least for now) is because only in ForStmt we can have declarations as well as other expressions in the for(...) part. For all other cases, we don't have such a scenario. WDYT?

138

This check is only for extraction from the for(<Init>; <Cond>; <Inc>) and not the body of the for.

SureYeaah updated this revision to Diff 206663.Jun 26 2019, 7:22 AM
SureYeaah marked 4 inline comments as done.

Refactored code

  • Refactored code as pointed by kadircet
  • Fixed crash for if statements without an else clause
SureYeaah updated this revision to Diff 206674.Jun 26 2019, 8:03 AM

Replaced auto with actual type for clarity

SureYeaah updated this revision to Diff 206676.Jun 26 2019, 8:06 AM

Fixed comment

kadircet added inline comments.Jun 26 2019, 8:12 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
96

ah I see, I was looking at this code in isolation so didn't notice the check for hitting a compoundstmt first. it makes sense now.

109

it is also possible to have a similar issue with

int a = 5, b = [[a + 2]];
123

s/above/just before

159

maybe move this into if (!ParStmt) case ?

187

maybe add a todo for replacing auto with explicit type?

SureYeaah updated this revision to Diff 206688.Jun 26 2019, 8:42 AM
SureYeaah marked 4 inline comments as done.

Added check for multi variable initialization

SureYeaah marked an inline comment as done.Jun 26 2019, 2:14 PM

Please forgive the mix of high-level and low-level comments here.
Happy to discuss further of course!

clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
56

nit: why take a DeclStmt instead of just the Decl here?

56

This function name is confusing - it's unclear which of DeclStmt vs Exp is declared in which, and that it's subexpressions of Exp that are being checked

maybe bool referencesLocalDecls? and rename Decl -> Scope, as it provides the meaning of "local"

57

nit: SourceManager is almost universally SM (or something more verbose)

63

nit: seems a little clearer to collect the referenced decls rather than the referencing exprs.
(It will also generalize better if we want to do things like notice referenced local classes, which we don't yet)

74

nit: "ValueDecl" is not usually a very interesting class conceptually, I'd just call this DeclLoc

74

throughout you're using auto a bit too heavily - usually doesn't aid readability when the type is a single word like SourceLocation here and isn't named in the expression

131

Another general style point: LLVM does use short variable names (like N) sometimes when the type unambiguously describes the role of the variable and the scope is small.

That's not the case here: a SelectionTree::Node could be almost anything so I'd suggest TargetExpr or so.

Conversely I think Parent or NewParent would be clearer than ParStmt which repeats more info from the type.

133

I might be missing something, but as I understand:

  • you're proposing moving the expr N out of the scope ParStmt
  • you're testing whether any subexpression of N references anything declared in certain substmts of ParStmt
  • because of "certain substmts" you're only supporting this for some types of ParStmt

why can't we just check for anything declared inside ParStmt, and make this fully general (applied for all stmts)?

158

this needs a much more descriptive name.
maybe getInsertionPoint()?

160

auto* (or const auto*) - we typically spell out the pointer-ness

161

please make this a for loop, you have an initializer, condition and increment

163

so there's a few cases for CurN here:

  • Expr
  • non-Expr Stmts
  • Decl
  • other weird things (TypeLoc, template parameter list, ...)

As a first cut I think we should be a bit more systematic and conservative about the cases where we allow a variable to be inserted far away from the extraction.

I'd suggest that for Expr your "default" behavior is to keep traversing up, for Decl you probably want to stop and give up and similarly for the "everything else" case you want to give up. Maybe there are cases where these are OK/important, but we should probably whitelist those.

For non-Expr Stmts, you're currently blacklisting CaseStmt for traversal through, I think this should be a whitelist instead, just allowing loops for now.

I think this obviates the need for needBraces and isNodeInside for now, unless I'm missing something.

174

here you're traversing the whole Expr to find the referenced decls at each iteration of this loop.
Can you analyse the expr just once, and reuse the list of decls?

187

no need to get the actual source code: length is SM.getFileOffset(ExpRng->End) - SM.getFileOffset(ExpRng->Begin)

187

I think you should maybe handle the case where ExpRng is none (e.g. because something is unexpectedly inside a macro).

If you're sure this can't happen, add an assert with a message.

203

also may need to use & or && as appropriate. FIXME is fine for this too.

209

fix for this has landed

222

this is clever and not really a common idiom in LLVM code, so a bit hard to decipher

usually prefer Stm = ...; return Stm != nullptr;

237

I'm not sure "dummy" is very meaningful to the user here.
"Extract to variable" or "Extract subexpression to variable"?

SureYeaah marked 14 inline comments as done.Jun 27 2019, 10:53 AM
SureYeaah added inline comments.
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
174

I thought it'll only be analyzed once but actually it's at max twice in the case where the selected expression is a part of a DeclStmt in the <Init> of a ForStmt.

SureYeaah marked 4 inline comments as done.Jun 28 2019, 9:25 AM
SureYeaah updated this revision to Diff 207084.Jun 28 2019, 9:57 AM

[Clangd] Refactored code

  • Created new class Extract to store information about the expression being extracted.
  • Doesn't fix all of previous comments

Looking for comments on the new class stucture

SureYeaah marked an inline comment as done.Jun 28 2019, 10:34 AM
kadircet added inline comments.Jul 1 2019, 1:58 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
35

most of the methods seem to be taking a SM, why not make it a class field ?

48

maybe some comments about the fields/methods below?

72

this method doesn't use class state, maybe move it out-of-class to a function?

164

store a unique_ptr instead

187

this seems to be using a lot of class state(and some functions are exposed merely for use of this function). why not make this one a class method instead?

Comments re the Extract class.
It seems OK to me, i'm not really sure whether it's an improvement or not after the pieces (needsBraces, checForStmt etc) are removed. Up to you.

(As the previous comments would reduce the scope of the patch, it'd be nice to address those before reviewing further).

clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
34

drop "class to store" from the comment

34

this class needs to have a clearer name and purpose.

ExtractionContext is marginally better, maybe there's a more specific unifying concept to be found.

42

Many of these names should be a little more specific about their semantics.
"Node" is the type, and isn't specific enough to know which node it is. getExprNode()?

51

probably call this computeReferencedDecls()? (or as Kadir noted, pull out of the class)

55

please inline trivial functions like this, getExpr(), getNode()

SureYeaah marked 13 inline comments as done.Jul 1 2019, 5:11 AM
SureYeaah added inline comments.
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
72

It uses the expr

sammccall added inline comments.Jul 1 2019, 5:26 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
72

Nevertheless, as it just uses one param, has an obvious standalone meaning, and is called from the constructor, it does seem clearer to me as a helper outside the class.

SureYeaah marked 2 inline comments as done.Jul 1 2019, 8:31 AM
SureYeaah updated this revision to Diff 207522.Jul 2 2019, 4:55 AM

Removed check for braces and fixed code for finding insertionpoint

sammccall added inline comments.Jul 4 2019, 5:01 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
43

nit: StringRef almost always for params you don't mutate

47

define trivial (and even simple) methods inline

66

nit: can you move the tweak to the end, to avoid alternating between the inner logic implementation and the outer interface?

89

I'd suggest naming this something more after the intent than after the structure, as we may want to refine it over time, e.g. isTooTrivialToExtract or isEligibleToExtract

91

Extracting just a declrefexpr (and replacing it with another declrefexpr) doesn't seem useful. Any reason to only do this for functions, rather than all declrefexprs?

91

a syntactically similar case is MemberExpr where isImplicitAccess() is true. (This is referring to a method of the current class, without qualification)

92

you may want to check whether this does the right thing for a function template

121

remove debug dumping

123

Can we be a bit more explicit about the control flow and boolean initialization here?
e.g. if (isAFunctionRef(Expr)) return; if (InsertionPoint = computeInsertionPoint()) Extractable = true;

136

The name doesn't really match the behavior here - there are lots of other reasons we couldn't insert before a point.
this sounds more like ExtractionContext::exprIsValidOutside or something?

138

this behavior is not at all obvious at the callsite. I'd suggest making the caller check for null, and taking a reference.

159

The scope of this patch is growing in ways that don't seem to be related to the review.

Can you revert the switch-related changes and move them to another patch?

166

this is the Parent, not an Ancestor, I think?

174

this describes what the code is doing, but that's fairly obvious from the code. It would be useful to say why instead.
e.g. "We only allow insertion inside a CompoundStmt. It must be written in the file, not a macro."

179

nit: please use braces consistently within an else-if chain

182

I think this is also new in this revision.
Can we move it to a followup patch? I'm not sure exactly what you're trying to do, but suspect there's a better way/place for it.

184

you're still traversing upwards through arbitrary AST nodes with a blacklist - as discussed previously this doesn't seem safe

220

this FIXME probably belongs on the isEligibleToExtract or similar function

221

it's not clear what the bug or fix is for this

SureYeaah updated this revision to Diff 208174.Jul 5 2019, 7:09 AM
SureYeaah marked 21 inline comments as done.

Added whitelist for computeInsertionPoint

SureYeaah added inline comments.Jul 5 2019, 9:55 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
91

Extracting

int a = [[f]]();

yields

auto dummy = f;
int a = dummy();

I thought this should be prevented. But now I see there's no need to do that.
I'll add a triviality check if needed in the next patch.

sammccall accepted this revision.Jul 8 2019, 9:48 AM

Awesome! Do you have commit access?

clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
38

nit: drop comments that just echo the name, like this one

83

why a delete expression? if this generalises to "expression of type void" then you can check for that directly

91

sure - I just mean int a = b; -> auto dummy = b; int a = dummy; is equally pointless, so I wasn't sure about the function check. But fine to defer.

102

nit: remove extra parens

108

nit: a variable reference out of scope

135

why?

138

can you add a comment explaining approximately why many (most) stmts are allowed but some others cause problems?

188

nit: trailing space? should this be a newline?

This revision is now accepted and ready to land.Jul 8 2019, 9:48 AM
This revision was automatically updated to reflect the committed changes.
SureYeaah marked 8 inline comments as done.
Herald added a project: Restricted Project. · View Herald TranscriptJul 9 2019, 3:12 AM
SureYeaah added inline comments.Jul 9 2019, 3:13 AM
clang-tools-extra/clangd/refactor/tweaks/ExtractVariable.cpp
188

Making it a newline makes writing tests harder.

thakis added a subscriber: thakis.Jul 9 2019, 6:57 AM
thakis added inline comments.
clang-tools-extra/trunk/clangd/refactor/tweaks/CMakeLists.txt
20 ↗(On Diff #208626)

(nit: keep in alphabetical order)