This is an archive of the discontinued LLVM Phabricator instance.

[clangd] A code action to swap branches of an if statement
ClosedPublic

Authored by ilya-biryukov on Jan 11 2019, 10:25 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

ilya-biryukov created this revision.Jan 11 2019, 10:25 AM

A deliberately simple syntactic transformation. Missing tests, but should work very reliably. To serve as an reference point for writing similar actions.

Hi Ilya, this seems really useful for people learning how to implement their custom actions!

clangd/refactor/actions/SwapIfBranches.cpp
36 ↗(On Diff #181313)

It seems to me we don't find If token whenever
CursorLoc == location of 'f' of the If token

For example if there's "if" starting at location 1:1 I think we proceed like this (hope my pseudocode is clear):

  1. If->getIfLoc() returns SourceLocation{1:1}
  2. we construct SourceRange{begin = 1:1, end = 1:1}
  3. toHalfOpenFileRange() returns SourceRange{1:1, 1:2}
  4. the condition for SourceLocation L in halfOpenRangeContains() is 1:1 <= LOffset && LOffset < 1:2 which is only ever true for L == 1:1

Do I understand it right?

62 ↗(On Diff #181313)

Just a typo in comment s/brances/branches/

ilya-biryukov added inline comments.Jan 15 2019, 3:24 AM
clangd/refactor/actions/SwapIfBranches.cpp
36 ↗(On Diff #181313)

In step (3) the constructed range is intended to be SourceRange{1:1, 1:3}, i.e. it should cover both chars of the if keyword.
I haven't checked it, though, will make sure that's the case when actually testing this.

  • Update after changes to parent revision
  • Fix a typo in the id of the SwapIfBranches
  • Rebase after parent change
  • Update to reflect changes in parent revision
  • Remove the header file

Cool! Adopting this one as the simplest tweak for "how should tweaks do X" questions.

This depends on helpers not (yet) present in this patch, so not commenting on them now.

Need unit tests for tweaks. Something like this should work:

Annotations Test(R"cpp(void foo() {
  [[if]] (1) { return; } else { continue; }
})cpp");
auto AST = TestTU(Test.code()).build();
auto Sel = Tweak::Selection::create(Test.Code(), AST, Test.range());
auto T = prepareTweak("SwapIfBranches", Sel);
ASSERT_TRUE(T) << T.takeError();
auto Edits = T.apply(Sel);
ASSERT_TRUE(Edits) << Edits.takeError();
auto After = applyAllReplacements(Test.code(), *Edits);
EXPECT_EQ(After, R"cpp(void foo() {
  if (1) { continue; } else { return; }
})cpp");

Probably want to wrap it up into a table driven test once we have a few.

clangd/refactor/tweaks/SwapIfBranches.cpp
34 ↗(On Diff #182321)

The before/after is useful, we should probably have it for all tweaks if possible.
It'd also be useful to notate where the cursor can be to trigger the action. Partly because it forces us to consider this!

e.g. (not sure if this actually matches the logic you want, just an example)

Before:
  if (foo) { return 10; } else { continue; }
  ^^^^^^^^^^            ^^^^^^^^           ^
After:
  ...
46 ↗(On Diff #182321)

I think prepare() should just verify:

  • cursor is in the right place
  • else statement exists and isn't an else if
  • both branches are compound statements (for now)
  • (maybe) relevant locations (if keyword, else keyword, braces) are not macro expansions

and then record the relevant source locations (or just the IfStmt*)

We may be able to get away with doing all the work in prepare(), but it's not a good model for future tweaks. (And it is at least somewhat wasteful on a hot path).

50 ↗(On Diff #182321)

(Mostly this is food for thought for later - we shouldn't try to solve everything in this patch)

Two efficiency problems here:

  • doing one traversal per tweak is wasteful (when we have >1 tweaks, but worth at least *thinking* about)
  • traversing the entire AST rather than just the nodes over the cursor is bad news (though "over the cursor" may not be trivial to define)

Idea for how to "framework" this problem away:
Add vector<DynTypedNode> SelectedNodes to the inputs, the idea being that this is the stack from the narrowest Expr under the cursor to the TranslationUnitDecl at the top.
This can be produced by a RecursiveASTVisitor (that either traverses the whole AST, or just the bits under the cursor if that's simple enough).
Tweaks would iterate over the nodes from narrowest to broadest, deciding whether to select this node for processing, continue iterating, or bail out.

Matching in checks are then pretty easy to write, we haven't removed too much flexibility in flow control, and it's pretty hard to write a slow check.

There are some complications:

  • we need access to parents somehow (e.g. consider the case of adding NS qualifiers, we want to match on names but then traverse up to the containing declrefexpr to get the nested namespace loc)
  • the interesting nodes may be a tree rather than a stack, because nodes overlap in the source. We could store a postorder traversal... but this makes e.g. "bail out when you hit a functiondecl" harder - you probably want to bail out of the current branch only.
  • things get complicated when we think about macros - depending on the semantics we want, it may be hard for the framework to prune parts of the tree that aren't under the cursor withouth traversing the whole AST.
84 ↗(On Diff #182321)

isa<CompoundStmt>

  • Move source code helpers to this patch
  • Update id of swap branches in tests
  • Rebase
  • Use helper to avoid creating RewriteBuffer
  • Use std::string to fix stack corruption
  • Use a helper to avoid creating RewriteBuffer
ilya-biryukov marked 4 inline comments as done.
  • Fix a typo in a comment: isValidRange -> isValidFileRange
  • Make action available under 'else' keywords and conditions
  • Move the logic of creating replacements to apply
  • Use llvm::isa instead of dyn_cast_or_null
ilya-biryukov marked an inline comment as done.Jan 22 2019, 7:13 AM
ilya-biryukov added inline comments.
clangd/refactor/tweaks/SwapIfBranches.cpp
34 ↗(On Diff #182321)

LG, updated the comment and ranges per suggestion.
I still not sure if the action should span the whole condition, but going with the suggested ranges anyway. We can tweak them later if the resulting ranges turn out to be problematic.

However, note that I excluded { and } from available ranges, but only because I'm lazy and this requires declaring some more local vars, etc.

46 ↗(On Diff #182321)

Done.
The only reason I thought applying replacements was a sensible idea is because it would fail in some weird macro cases (only when resulting ranges overlap, though). And it seemed better to avoid showing the action rather than showing it and failing later.

50 ↗(On Diff #182321)

I mostly agree, however I still wonder whether that would be inefficient in some cases. E.g. consider a corner if the input selection from LSP contains the whole file? I see two options there: (1) putting all nodes of a file into vector<DynTypedNode>, (2) putting only the top-level TranslationUnitDecl there.

It seems easy to end up with (1) to make the interface useful, however we probably prefer (2) because otherwise we're back into the worst-case scenario, i.e. "every tweak traverses all the nodes"

we need access to parents somehow (e.g. consider the case of adding NS qualifiers, we want to match on names but then traverse up to the containing declrefexpr to get the nested namespace loc)

I wonder if it's possible to instead write the checks so that they only look at the children of the nodes in a vector? My bet is that almost all of the checks only need to look one or two levels down, so this shouldn't turn into inefficiency.

I initially thought about providing a very limited set of AST nodes as inputs for the actions that are computed efficiently. Specifically, I thought about having:

struct Selection {
  Decl *DeclUnderCursor; // the innermost declaration under CursorLoc.
  Stmt *StmtUnderCursor; // the innermost statement under CursorLoc.
  Expr *ExprUnderCursor; // the innermost expression under CursorLoc.

  // For some tweaks (extract var, extract method) we will also need these:
  Expr *SelectedExpression; // expression exactly matching selection.
  vector<Stmt*> SelectedStatements; // a list of statements exactly matching 
selection.
};

The checks would only look at those few statements, e.g. "Swap If" would try dyn_cast_or_null<Stmt*>(S.StmtUnderCursor) and do the same checks for cursor locations.

We could expand the set of inputs as needed. This would mean that we don't have to design the framework, but that also mean writing some checks would involve some work in the shared code that computes these inputs.

I bet that complexity-wise we would end up with simpler checks and more complicated shared code to compute the inputs. OTOH, we won't need to struggle with the limitations of the AST to design the framework that would properly handle all interesting cases without doing extra work.

84 ↗(On Diff #182321)

Done.
I used dyn_cast_or_null, because isa expectes the arguments to be non-null, therefore we need an extra check before calling it. So the former looked better to me, but I'm actually ok either way.

  • Update the license header
sammccall accepted this revision.Jan 29 2019, 9:33 AM
sammccall added inline comments.
clang-tools-extra/clangd/SourceCode.h
63 ↗(On Diff #184041)

I think the semantics of the locations (expansion vs spelling) need to be spelled out here, at least in the comment.

clang-tools-extra/clangd/refactor/tweaks/SwapIfBranches.cpp
60 ↗(On Diff #184041)

what's the case where the condition is null?

63 ↗(On Diff #184041)

the explicit checks for equality at the endpoint seem... odd
do you need a halfOpenRangeTouches() or so?

101 ↗(On Diff #184041)

do this in apply()?

This is an example refactoring, if we have to add a bunch of checks because it fires too often then we should try to solve that systematically.

clang-tools-extra/unittests/clangd/TweakTests.cpp
46 ↗(On Diff #184041)

fixture has no state, can these be free functions?

116 ↗(On Diff #184041)

why StringLiteral here?

140 ↗(On Diff #184041)

checkTransform(ID, Input, Output), to avoid mixing styles?

clangd/refactor/tweaks/SwapIfBranches.cpp
50 ↗(On Diff #182321)

It seems easy to end up with (1) to make the interface useful, however we probably prefer (2) because otherwise we're back into the worst-case scenario, i.e. "every tweak traverses all the nodes"

I think the principled version of 2) is something like: for every node, find its own range, and its expanded range until the next non-comment token. e.g.:

                 #######################                
foo(); /* bar */ if (x) { y } else { z } /*baz*/ abort()
      ###########################################

The directly interesting nodes are those whose range intersects with the selection, and whose expanded range contains the selection, and none of whose children contains the selection.

Or something like that. I think you'd be able to invert the fuzziness here (do range-expansion on the selection endpoints instead of the nodes) to make the checks cheap.

This revision is now accepted and ready to land.Jan 29 2019, 9:33 AM
ilya-biryukov marked 8 inline comments as done.
  • Remove Dummy.cpp
  • Add halfOpenRangeTouches
  • Add a comment about file vs expansion locations
  • Move range manipulations with else and then to apply()
  • Remove test fixture, turn test member functions into free functions
  • Add checkTransform
  • Replace a null check for getCond() with an isValid() check for the corresponding location.
Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2019, 1:19 PM
ilya-biryukov added inline comments.Jan 31 2019, 1:20 PM
clang-tools-extra/clangd/SourceCode.h
63 ↗(On Diff #184041)

Added a small comment, but it could probably be improved. Let me know what's missing.

clang-tools-extra/clangd/refactor/tweaks/SwapIfBranches.cpp
60 ↗(On Diff #184041)

I initially thought of invalid parse trees (i.e. if() {}), but it turns out this never happens.

From experience, it's better to assume everything can be null in the AST. This saves a lot of time chasing rare null dereferences later and a lot of time investigating whether something is null in a particular case we're facing.

In our particular case IfStmt seems to always have a condition, but the condition will have an invalid location when it's empty. I've added a test that the action still works (I see no reason why it shouldn't if we assume the parser recovery is good). Happy to discuss and adjust, though, let me know what you think.

63 ↗(On Diff #184041)

Done. Thanks for the suggestion, this does make the code much clearer.

101 ↗(On Diff #184041)

Done.

This goes back to a similar comment about precomputing the replacements... Doing this in advance means we're not showing the check in some cases where it would later fail because of macros.

However, macros support is poor in the current state of the checks anyway, we'll need to invest some more time to make it better.

clang-tools-extra/unittests/clangd/TweakTests.cpp
116 ↗(On Diff #184041)

Seems like a good default for a constant string, i.e. it can compute the size at compile time.

This revision was automatically updated to reflect the committed changes.
nridge added a subscriber: nridge.Jan 31 2019, 4:39 PM

This commit is causing clangd to fail to link, with errors like this:

tools/clang/tools/extra/clangd/refactor/tweaks/CMakeFiles/obj.clangDaemonTweaks.dir/SwapIfBranches.cpp.o:SwapIfBranches.cpp:function clang::RecursiveASTVisitor<clang::clangd::(anonymous namespace)::FindIfUnderCursor>::TraverseParmVarDecl(clang::ParmVarDecl*): error: undefined reference to 'clang::ParmVarDecl::hasDefaultArg() const'

(and dozens more that are similar).

Herald added a project: Restricted Project. · View Herald TranscriptJan 31 2019, 4:39 PM

This commit is causing clangd to fail to link, with errors like this:

tools/clang/tools/extra/clangd/refactor/tweaks/CMakeFiles/obj.clangDaemonTweaks.dir/SwapIfBranches.cpp.o:SwapIfBranches.cpp:function clang::RecursiveASTVisitor<clang::clangd::(anonymous namespace)::FindIfUnderCursor>::TraverseParmVarDecl(clang::ParmVarDecl*): error: undefined reference to 'clang::ParmVarDecl::hasDefaultArg() const'

(and dozens more that are similar).

Hi Nathan,
What platform is this on? Not seeing it on the buildbots.
Anything unusual in build setup (standalone build, building with shared libraries, etc)?

Hi Nathan,
What platform is this on? Not seeing it on the buildbots.
Anything unusual in build setup (standalone build, building with shared libraries, etc)?

I'm on Linux, building with shared libraries. Not sure what standalone means in this context, I'm building the monorepo with -DLLVM_ENABLE_PROJECTS="clang".

nridge added a comment.EditedJan 31 2019, 5:55 PM

The complete command that's failing is:

/usr/bin/clang++-8  -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -std=c++11 -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -fno-common -Woverloaded-virtual -Wno-nested-anon-types -O0 -g3  -fuse-ld=gold -Wl,-allow-shlib-undefined tools/clang/tools/extra/clangd/refactor/tweaks/CMakeFiles/obj.clangDaemonTweaks.dir/SwapIfBranches.cpp.o tools/clang/tools/extra/clangd/tool/CMakeFiles/clangd.dir/ClangdMain.cpp.o  -o bin/clangd  -Wl,-rpath,"\$ORIGIN/../lib" lib/libLLVMSupport.so.9svn -lpthread lib/libclangBasic.so.9svn lib/libclangTidy.so.9svn lib/libclangDaemon.so.9svn lib/libclangFormat.so.9svn lib/libclangFrontend.so.9svn lib/libclangSema.so.9svn lib/libclangTooling.so.9svn lib/libclangToolingCore.so.9svn
clang-tools-extra/trunk/clangd/SourceCode.cpp