Page MenuHomePhabricator

[clangd] Interfaces for writing code actions

Authored by ilya-biryukov on Jan 3 2019, 5:23 AM.



The code actions can be run in two stages:

  • Stage 1. Decides whether the action is available to the user and collects all the information required to finish the action. Should be cheap, since this will run over all the actions known to clangd on each textDocument/codeAction request.
  • Stage 2. Uses information from stage 1 to produce the actual edits that the code action should perform. This stage can be expensive and will only run if the user chooses to perform the specified action in the UI.

Trivial actions can also produce results in stage 1 to avoid an extra
round-trip and read of the AST to reconstruct the action on the server.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
  • Remove 'using namespace llvm'

I've added a few sample actions, please take a look.

The major thing that's missing from the design is how we can define an interface for actions to get the nodes they interested in from the AST without doing an AST traversal in each of the actions separately. I have a few options in mind, will be happy to explore more of this.
The other major problem that I've run into while playing around with the action in VSCode is increased latency: my suspicion is that the codeAction requests are not getting cancelled, I'll confirm and file an issue against VSCode if that's the case.

High level comments:

  • this is an important concept and in hindsight it's amazing we didn't have it yet!
  • I don't get a clear sense of the boundaries between "code action", "refactoring", and "action". There are also "prepared actions" and "instant actions". I think some combination of fewer concepts, clearer names, and more consistent usage would help a lot.
  • it's pretty heavy on registries/factories/preparedX, maybe we can find a way to simplify
  • many (most?) actions are going to do some "walk up AST looking for X", the interface could help make that easy

Seems a bit more direct to give each one its own command name? Maybe under a namespace like refactor/swapIfBranches


This seems like it could just be a helper function... what does it capture?
I think it might belong next to PreparedActions just like we have toLSPDiags in Diag.h - it's not ideal dependency-wise, but inlining everything into ClangdLSPServer seems dubious too. Happy to discuss offline...


This optimization seems inessential, and complicates the model a bit. Can we leave it out for now?


Unclear what the second sentence means, I'd suggest dropping it. Most obvious (to me) interpretation is that this can't fail, which isn't true.
nit: "Previously precomputed" is redundant :-)

16 ↗(On Diff #181321)

why is ActionProvider in refactor/ but CodeActions/ is here?

23 ↗(On Diff #181321)

I'm not sure this indirection is necessary, would a StringMap<u_p<ActionProvider>> suffice?

Since it's not actually complicated and there's just one caller, I think knowing what prepareAll is doing at the callsite would be an improvement.

67 ↗(On Diff #181321)

I think we should consider (not necessarily adopt) designing the interface around walking up the AST node chain, rather than bolting it on later.

71 ↗(On Diff #181321)

I'd prefer a string here, this will make the protocol easier to understand for human readers.

78 ↗(On Diff #181321)

This is a concrete class that holds a type-erased unique_function with the actual logic. Similarly, the factories are just wrappers of unique_function.
This method of abstraction is harder (for me at least) to follow than expressing the commonality of the important objects (the refactorings) as inheritance. And more concretely, stack traces are going to be terrible :-)

The factory/prepared duality makes it hard to represent a refactoring as a single class, but let's have a think about this part.

87 ↗(On Diff #181321)

Why is this dynamic and not a constructor param?

To be a bit more concrete...

One approach which won't win any OO design points but might be quite easy to navigate: combine the ActionProvider and PreparedAction into one class.

class SwapIfBranches : public MiniRefactoringThing {
  SwapIfBranches(); // must have a default constructor.
  const char* id() const override { return "swapIfBranches"; } // must be constant for the subclass.
  bool prepare(const Inputs&) override; // may only be called once. Returns false if this operation doesn't apply.
  std::string title() override; // REQUIRES: prepare() returned true
  Expected<tooling::Replacements> apply() override; // REQUIRES: prepare() returned true

note that the "subclasses of a common interface, have a default constructor" fits exactly with the LLVM Registry pattern, if we want to move to that.
Otherwise, a StringMap<unique_ptr<MiniRefactoringThing>>

A more... complicated version could use something like CRTP to give a generic external interface while keeping the internal dataflow/signatures clean:

class SwapIfBranches : public MiniRefactoringThing<SwapIfBranches> {
  struct State { ... }; // completing forward declaration from CRTP base
  Optional<State> prepare(const Inputs&) const override;
  std::string getTitle(const State&) const;
  Expected<tooling::Replacements> apply(const State&) const override;
const char MiniRefactoringThing<SwapIfBranches>::ID[] = "swapIfBranches";

Similarly we could use Registry or a StringMap<unique_ptr<MiniRefactoringThingIface>> where the CRTP base implements the interface.

Name ideas:

  • "Refactoring" - I think the main drawback is the conflicts with potential global-refactorings that we do in clangd. I don't think the conflict with tooling/refactor is bad enough to avoid this. I also don't think "some of these don't look like refactoring to me" is a strong reason to avoid it.
  • "Code action" - I think the partial overlap with the protocol concept is confusing. (Worse, the protocol has two things called code action, that partially overlap). It's also not really a standard term outside LSP/VSCode, so we should be able to come up with something as good.
  • "Action" - much too vague, I think
  • "RefactoringAction" (or refactoring/Action, or refactoring::Action...) - not terrible but I think dominated by just "Refactoring" or "Refactor".
  • "AutoEdit" or something equally descriptive - this is bland along the lines of Code Action, may be ok if it's sufficiently short and unique
  • "Swizzle" or something equally unique-and-semantics-free - I'd be happy with this kind of alternative if we can't find something that clearly means what we want

Offline discussion:

  • CRTP doesn't buy anything. Simple inheritance is less clean than current function-passing version but probably more familiar.
  • Tweak is a good name: short, evocative but not commonly used, works as a noun and a verb.
  • Registry might be a good enough fit to use directly

Here's the result of my local, er, tweaking to ActionProvider.h.

//===--- Tweak.h -------------------------------------------------*- C++-*-===//
//                     The LLVM Compiler Infrastructure
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
// A tweak is a small context-sensitive refactoring-like action that is
// triggered at a cursor location, examines the AST and produces a set of edits.
// It has two main pieces of logic:
//   - prepare() quickly checks whether the action is applicable at the cursor.
//     It must be cheap as all tweaks are prepared to list them for the user.
//   - apply() computes the actual edits.
//     This can be more expensive, only the chosen tweak is applied.


#include "ClangdUnit.h"
#include "Protocol.h"
#include "llvm/ADT/Optional.h"
#include "llvm/ADT/StringRef.h"
namespace clang {
namespace clangd {

// Abstract base for small context-sensitive refactoring actions.
class Tweak {
  // Information about the location where the tweak was triggered.
  struct Selection {
    // The path of an active document the action was invoked in.
    llvm::StringRef File;
    // The text of the active document.
    llvm::StringRef Code;
    // Parsed AST of the active file.
    ParsedAST &AST;
    // A location of the cursor in the editor.
    SourceLocation Cursor;
    // FIXME: full selection bounds.
    // FIXME: AST node under cursor.
    // FIXME: allow tweaks to specify matchers, and include match results.
    // FIXME: allow access to index and other source files.

  // Tweaks must have a no-op default constructor.
  virtual ~Tweak() = default;

  virtual const char* id() = 0;
  // Determines whether this tweak can run at the selection.
  // This function may record information to be used later.
  // It should run as fast as possible, particularly when returning false.
  virtual bool prepare(const Selection &) = 0;

  // Generate the edits for this tweak.
  // REQUIRES: prepare() was called and returned true.
  virtual llvm::Expected<tooling::Replacements>
  apply(const Selection &) = 0;

  // Description of this tweak at the selected location.
  // e.g. "Out-line definition of toString()".
  // REQUIRES: prepare() was called and returned true.
  virtual std::string describe() const;

// All tweaks must be registered, next to their definition.
#define REGISTER_TWEAK(Subclass)                                               \
  static ::llvm::Registry<::clang::clangd::Tweak>::Add<Subclass>                 \
      TweakRegistrationFor##Subclass(Subclass{}.id(), #Subclass);

// Calls prepare() on all tweaks, returning those that can run on the selection.
std::vector<std::unique_ptr<Tweak>> prepareTweaks(const Tweak::Selection &);
// Calls prepare() on the tweak with a given ID.
// If prepare() returns false, returns an error.
Expected<std::unique_ptr<Tweak>> prepareTweak(StringRef ID,
                                              const Tweak::Selection &);

} // namespace clangd
} // namespace clang

  • Rename ActionProvider to Tweak
  • Update an interface of Tweak
  • ActionInputs -> Tweak::Selection
  • Remove CodeAction.h, use tweak registry instead

(Mostly just comments on Tweak, I think the comments on other parts still apply)


naming nits:

  • this should mention "main-file"
  • doesn't need to mention "position" as it's just the param type
  • 'sourceLoc' seems an odd abbreviation

Maybe sourceLocationInMainFile?

(Current name is consistent with sourceLocToPosition but that name is bad too...)


Can we drop this indirection and use the registry directly?


(nit: I'd make this operator< of Tweak? e.g. if priority is added, it should be private)


nit: can we move this to the cpp file and out of the public interface?
(or keep the registry public, but drop prepareTweak[s] and just make callers do it)


should this return ErrorOr<unique_ptr<Tweak>> so we can treat both missing ID and prepare failed as errors with messages?

ilya-biryukov marked an inline comment as done.
  • Build tweaks as an object library. To make sure linker does not optimize away global constructors.

Haven't yet addressed all the comments, but switched to use the "object library" (i.e. a collection of .o files) to make sure linker does not optimize away global ctors required by registry.


Sure, would mean linear time for prepareTweak, but we probably don't care.

ilya-biryukov marked 9 inline comments as done.
  • Remove intermediate StringMap, use registry directly
  • Rename codeAction to codeTweak in ClangdServer
  • Rename a helper to get position to sourceLocationInMainFile
  • Move the Registry typedef into the .cpp file
  • Make toCodeAction a helper function
  • Update comments
ilya-biryukov added inline comments.Jan 17 2019, 9:37 AM

I'm not sure what we're winning here.

Currently we have: command:{ name: "applyCodeAction", args: ["swap-if-branches", "…"] }
In the proposed approach we'd have: command: { name: "swap-if-branches", args: […] }

The latter would definitely mean more fiddling at least to communicate the list of supported actions. Both approaches don't seem to require any changes on the client side, even though they affect the messages we send over the protocol.


I've moved it into a helper function, moved away from using a lambda.
However, kept it in ClangdLSPServer.cpp, Diag.h does feel like the wrong place for it, it would be nice to find a better place for both functions.


My personal preference would be to keep it in .cpp file to avoid complicating the public interface with a function and a friend declaration for prepareTweaks (let me know if I misinterpreted your proposal).
Let me know if you have strong preference wrt to this one


I don't think that makes a big difference, though: hiding the typedef would hide the fact we're using the Registry, though. We still have REGISTER_TWEAK that mentions it in the public interface.

This should be ready for another round of review. Let me know if I missed any of the older comments.
There are two more things that I'd like to do before this lands:

  • go through the docs again and update/simplify them to make sure they're in sync with the new interface,
  • publish the tests I have, they're not 100% ready yet, so I'm keeping the, local for now. This could be a separate change, though I don't think this should matter much if the rest is dealt with.

This looks really good, everything else is pretty superficial I think.

I think we'll want to add one lit test for the whole two-step tweak workflow, as well as TestTU/annotation-based unit-tests for each tweak.
As this patch has no actual tweaks in it, we can work out the details when landing the first one.


comment: tweak must have been successfully prepared?


Comment should be more tentative (calculating edits *may* be expensive) since we don't have the immediate-edit optimization yet.
Decent place for a FIXME or comment regarding the optimization too.


The returned Tweak object contains references to the AST, which may be dangling, so it's not actually safe to use.
It would be nice to allow callers to either directly apply a returned tweak (with progress saved) or to apply one "later" by name, but the way that transactions work in ClangdServer only the latter is really possible.

So I'd suggest this should return vector<pair<TweakID, string>>, which makes it clear(ish) how this relates to applyCodeTweak




tweak? It's a verb...






"tweakId" or just "id"




Not convinced about this helper function.

  • much of it is just a passthrough to struct initialization. I think the code calling it would be clearer if it was initialising the fields one-by one
  • the only part that's not passthrough is already a function call with a clear name, calling it seems reasonable
  • I'm not sure it makes sense to have the Range -> SourceLocation conversion in this file, but the Tweak -> CodeAction conversion outside this file (and not unit-testable). There's an argument to be make to keep this file independent of LSP protocol structs, but I think that argument applies equally to this function.

Expected<Selection>? Passing an invalid range is always an error I guess.


nit: one of my lessons from clang-tidy is that mapping between IDs and implementations is annoying.
Since IDs are 1:1 with classes, can we just require this to be the class name?

(If you wanted, I think you could adapt REGISTER_TWEAK so that it goes inside the class defn, and then it could provide the override of id() itself)

sammccall added inline comments.Jan 18 2019, 7:48 AM

(action context?)


we should format::cleanUpAroundReplacements... fine to leave this as a FIXME


Hmm, maybe we should drop this until we know how cross-file tweaks will work?

ilya-biryukov marked 10 inline comments as done.
  • Update comments
  • Return an ID+Title instead of a Tweak object
  • Rename codeAction -> tweak everywhere
  • Remove Tweak::Selection::File
  • Return Expected<Selection> instead of Optional<Selection>
  • Add a FIXME to apply the formatter on top of the actions

Added a FIXME. We probably want a helper that does this, to reuse in tests and ClangdServer.


That's too short for my taste, I went with applyTweak and it also feels helpful if we don't switch between a verb and a noun to keep clear a parallel between a Tweak returned by enumerateTweaks and a tweak applied in applyTweak.

Let me know if you feel strongly about this :-)


Used applyTweak for symmetry with applyFix (and other reasons mentioned in the comments for ClangdServer::tweak)


The reason I added it is to avoid duplication between in the test code and ClangdServer, which are the only two clients we have.
I expect this to be more useful when we add a way to traverse the subset of the AST in the checks


That would mean no two tweaks are allowed to have the same class name. This is probably fine, but somewhat contradicts C++, which would solve it with namespaces.
To be fair, there's a simple trick to grep for the id to find its class, so I'd keep as is.

If we choose to adapt REGISTER_TWEAK, that would mean we force everyone to put their tweaks only in .cpp files. That creates arbitrary restrictions on how one should write a check and I'm somewhat opposed to this. But happy to reconsider if you feel strongly about this.

Can live with applyTweak etc, though it makes me sad.


I understand. I think as things stand both callers would be clearer (if a couple of lines longer) without this helper.

What the API should be in the future - happy to talk about that then.


I don't care about the details (e.g. whether REGISTER_TWEAK sets the name, asserts the name, or none of the above).

I do care that we don't add a second ID for a class that's not equal to the class name. This is both a bad idea from first principles and from experience with clang-tidy.
If this were ever to become a real problem, I'm happy to include the namespace name in the ID.

ilya-biryukov marked 3 inline comments as done.
  • Inline Tweak::Selection::create()
  • Define Tweak::id() in REGISTER_TWEAK.
ilya-biryukov marked an inline comment as not done.Jan 22 2019, 2:42 AM
ilya-biryukov added inline comments.

I'd keep this function, here is my reasoning:

  1. There are 4 usages already (2 in this patch, 2 in the tests), keeping them in sync means an annoying (to my taste) amount of copy-paste.
  2. This function is not completely trivial, as it makes a decision on what a CursorLoc is and documents it.
  3. Making changes is simpler: e.g. if we add more fields to Tweak::Selection, we'd end up changing this function (possibly its signature too) and all of its callers. There's almost no room for error. If we remove it, one would have to find and update all usages by hand and make sure they're the same.

I did update the patch to inline it, but let me know if any of this convinces you.


Done, REGISTER_TWEAK now defines id.

Out of curiosity, what are the problems with clang-tidy checks that you mention? Finding a check by its name being hard is something that comes to mind. Anything else?

ilya-biryukov marked an inline comment as done.Jan 22 2019, 2:48 AM
ilya-biryukov added inline comments.
27 ↗(On Diff #182870)

Please note I added these assertions here.

It feels weird to traverse twice on every call to prepareTweaks, but that's the simplest option I could come up with.

ilya-biryukov marked an inline comment as not done.Jan 22 2019, 2:48 AM
sammccall accepted this revision.Jan 25 2019, 8:49 AM
sammccall added inline comments.
425 ↗(On Diff #182870)

tweak or applyTweak

634 ↗(On Diff #182870)

comment here?

59 ↗(On Diff #182870)

(can we add a unit test for this function?)

27 ↗(On Diff #182870)

Looks fine - you could consider extracting to a separate function validateRegistry or so, but up to you

This revision is now accepted and ready to land.Jan 25 2019, 8:49 AM
ilya-biryukov marked 7 inline comments as done.
  • s/applyCodeAction/applyTweak
  • Added a comment to TweakArgs
  • Added a unit test for sourceLocationInMainFile
  • Extract a 'validateRegistry' function
  • Do not log CancelledError.
ilya-biryukov added inline comments.Jan 28 2019, 1:16 PM
425 ↗(On Diff #182870)

Done. Thanks for spotting this, totally missed it.

59 ↗(On Diff #182870)

Added a test in ClangdUnitTests for a lack of a better place. SourceCodeTests only test non-source-manager-related things, so adding a dependency on SourceManager there seemed weird.

Happy to move them elsewhere, let me know what you think.

  • Update license headers in new files
  • Add an empty cpp file to avoid cmake errors due to empty inputs
  • clang-format
  • Update the 'fixits-command.test' to unbreak it (the line number was larger than the number of lines in the file)
This revision was automatically updated to reflect the committed changes.