Page MenuHomePhabricator

[clangd] Add support for inline parameter hints
ClosedPublic

Authored by nridge on Mar 16 2021, 4:15 PM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
nridge added a comment.EditedMar 16 2021, 4:34 PM

Some notes on this:

This feature is based on https://github.com/microsoft/language-server-protocol/issues/956, which combines inline parameter hints with other types of annotations that can be shown inline under a single feature called inlay hints. The protocol is designed to accommodate different types of hints, though the only kind currently provided are parameter hints.

The protocol bits are more specifically based on https://github.com/microsoft/vscode-languageserver-node/pull/609/, with some changes to reflect the discussion in the LSP issue (notably, the addition of a position field as discussed in this comment).

I have a corresponding client-side patch here: https://github.com/clangd/vscode-clangd/pull/168

I do have some questions about the details of the protocol, which I've asked inline.

clang-tools-extra/clangd/ClangdLSPServer.cpp
1478

I expect the protocol in this patch to be largely in line with the version standardized in LSP, but there may be some minor differences as that's not yet finalized.

Does that mean we should rather name this clangd/inlayHints for now?

clang-tools-extra/clangd/Protocol.h
1496

I'm not sure if there's much point in keeping these constants in the protocol. With the addition of InlayHintPosition, the InlayHintKind no longer carries any semantic meaning, beyond allowing the client to enable/disable hints on a per-kind basis.

I do think that enable/disable granularity is useful, but it could also be accomplished with a freeform string as the kind. Should I change it to a string? Or keep the enum for clangd's use but serialize it as a string?

1531

In the protocol documentation, I've repeatedly used the term "token" to describe the fragment of code to which a hint applies, but technically it's not necessary a single AST token. For example, for parameter hints, they apply to the entire argument expression. Should I say something like "AST node" instead?

nridge updated this revision to Diff 333778.Mar 29 2021, 12:33 AM

Some improvements

Adding a few potential reviewers.

Thanks for working on this! It looks like it's getting traction in VSCode/LSP now, which makes it much more compelling.

Protocol stuff: obviously in the long run we want to match where LSP goes, and the fewer changes between now and then the better.
But there's enough uncertainty that I don't think we have much chance of guessing right enough to be compatible at this point.
So rather than introducing another variant that's likely to be "wrong", let's use rust-analyzer's protocol as-is?
I don't think leaving out "position" is a big deal as long as we're only providing one type of hint.
(I'd even be tempted to call it "rust-analyzer/inlayHints" but maybe that's too silly)


I dug a bit into the differences based on:

Looks like VSCode calls this "inline hints" but has a TODO to change back to "inlay hint". So naming seems likely correct.

Ranges/positions are very different! In particular, I think range has the same name+type but different semantics.
For the parameter hint:

foo(bar: 42);
The rust-analyzer protocol wants range = `foo([[42]]);`, label="bar: " position = before
The VSCode API wants range = `foo([[]]42);`, label="bar:", whitespaceAfter = true

This is a tough call because I think the rust-analyzer proposal is better (retains more info + flexibility like you say) but the VSCode one seems fairly likely to win :-/ We should provide feedback here.

"text" and "label" seem an arbitrary different choice of name.

sammccall added inline comments.Mar 31 2021, 1:43 PM
clang-tools-extra/clangd/ClangdLSPServer.cpp
1478

I think differences are quite likely, as VSCode APIs diverge from what's in the PR.

So yeah, I think this should be clangd/inlayHints or rust-analyzer/inlayHints.

clang-tools-extra/clangd/Protocol.h
1496

We should at avoid having string literals scattered all over the impl + tests.

I like enum-serialized-as-string a little better than string-with-constants-exposed...

(Also it seems reasonably likely they'll pull the same dictionary trick as with semantic-tokens, so we can send enum numbers over the wire)

1501

any point including the unused kinds here, rather than leaving FIXMEs for ideas we have?

(I ask because it's not obvious to me that the caret is in the right place in that example, but it seems silly to worry about it now)

1501

implicit conversions/constructor calls is a potentially interesting set of hints too

1527

as mentioned I don't think we need to include position for now. If we do, 0/1 seems like an unlikely representation...

1531

I'd just say something like "the range of source code"... Both "token" and "ast node" seem unnessarily specific.

clang-tools-extra/clangd/XRefs.cpp
1977

It's not obvious to me exactly what cases "implicit constructor call" covers, can you be more precise?
I suspect this comes down to "does the constructor look like a function call with an argument list", in which case getParenOrBraceRange() is obviously the right check.

1979

I think you also want to exclude E->isStdListInitialization - in this case you have braces but it's not the arg list

1999

nit: can we take an ArrayRef instead of pointer+length?

2004

do we not want to assume that an incomplete call is a prefix?
Seems like the inlays might be helpful in diagnosing that situation :-)

Actually I guess we'd have to handle RecoveryExpr to get that to work. I did have an idea once that RecoveryExpr would contain a Stmt::Kind indicating what we *tried* to build, which we could use here.

Anyway, something for the future.

2004

should we also bail out if numParams > argcount and the function isn't variadic?
Not sure we'll actually get here, but that goes for the < condition too...

2012

Ah, redecls are interesting. Many possible variations on this strategy.

For example:

  • we could iterate over all redecls instead of just canonical + definition
  • we could fall back arg-by-arg rather than for the whole arg list

I think it might be worth moving this into getParameterNames (or introducing an intermediate function) to allow some more implementation flexibility (now or in the future).

2024

Very nice.
I think we also want to omit it if there's a /* foo= */ comment (whitespace and = optional).

It's extremely annoying that TokenBuffer doesn't capture comments, but I think it would work well enough to simply match the source code up to the insertion point textually.

2030

It's really tempting to include by-ref-ness as foo&: or something, but the distinction between const/non-const is critical and I can't think of a similarly compact way to denote that.

2046

How do you feel about stripping leading underscores from names, in order to prettify standard library calls?

e.g. libc++'s std::copy has parameters __first, __last and __result to guard against macros, libstdc++ and MSVCSTL seem to do the same. It's going to look really noisy!

(I don't think we currently do this anywhere, it's obviously relevant to signature help & code completion too... but have to start somewhere!)

2059

I don't think param decls can be special names, so instead of printing can you just grab the ASTContext-owned StringRef from the DeclarationName? Then we can return vector<StringRef> with no copies.

And while here, it seems like SmallVector<StringRef, 8> or so would be even better.

(Can merge this with the anonymous case: either getAsIdentifierInfo gives you an identifier or you insert empty)

2066

Ooh, macros :-)

This is fine for now, but because this will be displayed always (for some users) failures are going to be fairly visible.

I don't think this check is quite sufficient to be safe:

#define PATHOLOGICAL(X) X , X
atan2(y: 1 + PATHOLOGICAL(2) + 4); // confusing! prefer no hint

and also excludes useful cases. Consider:

#define PI 3.14
atan2(PI, PI); // want hints
ASSERT(atan2(0, 1) == 0); // want hints

We don't really want to place a macro-handling tax on every type of hint we emit, so it would be nice to handle it generically here.
But it doesn't seem trivial and I'm wary of embarking on another toHalfOpenFileRange-like mission.

Maybe a compromise:

  • we pass an "anchor" location, which for param-hints will be the ( - and we bail out if the ) is in a different file
  • we bail out if the anchor is in a macro body (getTopMacroCallerLoc isn't the main file)
  • we convert R to a range in the anchor file by following expansion edges. This range must be precise, i.e. not cover any other parsed tokens. This means the start must always be isAtStartOfImmediateMacroExpansion etc.
  • now this range is guaranteed to be usable, just take the spelling locations of the endpoints

(Doing this with *just* the sourcerange R is probably possible but the heuristics to determine the anchor fileID seem complicated)

clang-tools-extra/clangd/XRefs.h
135 ↗(On Diff #333778)

XRefs.cpp has too much stuff in it today.
This patch doesn't add too much but... any reason not to put it in a new file?

clang-tools-extra/clangd/test/initialize-params.test
76

I think it's important we namespace this one for now (probably more important than the method itself!)

Note that vscode-clangd and clangd aren't version-locked

clang-tools-extra/clangd/unittests/InlayHintTests.cpp
189

I guess/hope you don't want to handle CXXUnresolvedConstructExpr too. If so, there are cases to deal with like when we can't tell whether or not we're calling an init-list-constructor...

nridge added a comment.Apr 4 2021, 1:07 AM

Thanks for the review!

So rather than introducing another variant that's likely to be "wrong", let's use rust-analyzer's protocol as-is?
I don't think leaving out "position" is a big deal as long as we're only providing one type of hint.

I do plan on working on other hint types (especially TypeHint) as a follow-up.

But we could still leave out "position", and let the client infer before vs. after based on the hint type (this what the Rust client does).

I dug a bit into the differences based on:

Ranges/positions are very different! In particular, I think range has the same name+type but different semantics.
For the parameter hint:

foo(bar: 42);
The rust-analyzer protocol wants range = `foo([[42]]);`, label="bar: " position = before
The VSCode API wants range = `foo([[]]42);`, label="bar:", whitespaceAfter = true

This is a tough call because I think the rust-analyzer proposal is better (retains more info + flexibility like you say) but the VSCode one seems fairly likely to win :-/ We should provide feedback here.

Good catch. Digging into that PR a bit more, it seems like the reason for the range working in this way is that the C# language server has an "overwrite" mode for inline hints that you can enter, where the provided range is overwritten (temporarily) with the hint contents.

For example, if the hint range were [[auto]] x = 42;, in overwrite mode that would be rendered as int x = 42;.

In this model, it makes sense to make the range nonempty for some hint types (like the [[auto]] case), but not others (like parameter hints, where you wouldn't want the parameter name to replace the argument value).

So... if we're going to provide feedback here, we should probably suggest something that accommodates this use case. Perhaps an additional flag in the hint, to indicate if the "overwrite" behaviour is sensible for it? (Or perhaps the hint type is enough to infer that?)

nridge updated this revision to Diff 335210.Apr 4 2021, 11:54 PM
nridge marked 14 inline comments as done.

Address some review comments (others remain)

clang-tools-extra/clangd/Protocol.h
1501

The motivation for the proposed placement of type hints is that in several other languages (e.g. Rust, TypeScript), you can write both let var : type = init; (where the type is explicitly specified) and let var = init; (where the type is deduced). In those languages, the natural placement of the type hint is the place where the explicit annotation would otherwise go.

For consistency / familiarity, I figured it would make sense to use the same placement for C++.

Anyways, I've removed the enumerator for type hints now, we can discuss them further in the patch that adds them.

clang-tools-extra/clangd/XRefs.cpp
1979

Good catch, thanks!

I tried to write a test for this, but TestTU doesn't seem to like #include <initializer_list> (or any other standard library include). Do you know why that might be?

2004

You make some great points here:

  1. We don't seem to get here for any sort of invalid call. I guess that means we could probably replace this check with an assert, but given the complexity of C++ I always worry about edge cases...
  2. For variadic functions, we currently repeat the variadic parameter name for each matched argument. That doesn't seem great, it's probably better not to show hints at all for those args?
2004

Ha, I did find a legitimate case where the Callee->getNumParams() < ArgCount condition holds: C-style varargs functions. We currently don't show any hints for those, but presumably it would be better to show hints for the non-vararg parameters.

2024

I think we also want to omit it if there's a /* foo= */ comment (whitespace and = optional).

In my mind, one of the points of inline parameter hints is that they fulfil the functions of /* foo = */ comments without you actually having to write them.

Thus, I would argue for leaving the hints there, as a signal that the comment can now be removed.

2046

Sounds like a good idea to me!

2066

#define PATHOLOGICAL(X) X , X
atan2(y: 1 + PATHOLOGICAL(2) + 4); // confusing! prefer no hint

Surely, people who write this deserve a confusing hint?

(I kid, I kid. Your macro suggestions are reasonable and I will give them a try.)

clang-tools-extra/clangd/unittests/InlayHintTests.cpp
189

It hadn't occurred to me, but yeah, that seems like a good reason to avoid handling CXXUnresolvedConstructExpr for now.

nridge planned changes to this revision.Apr 4 2021, 11:57 PM

Marking this as "Changes Planned" because there are more review comments I need to address.

njames93 added inline comments.
clang-tools-extra/clangd/XRefs.cpp
2024

In my mind, one of the points of inline parameter hints is that they fulfil the functions of /* foo = */ comments without you actually having to write them.

Thus, I would argue for leaving the hints there, as a signal that the comment can now be removed.

I disagree with that mentality, Not everyone on a project will be using a language server/ide that supports inline hints. A push to remove those comments will harm other contributors who don't see the inline hints. It would also harm readability where the code is hosted as they wont show inline hints.

Having said that, inline hints definitely makes code easier to work on so there is still a lot of value in this feature.

This is looking pretty good! Can't wait to try it, and some of my suggestions are just scope creep, so feel free to draw the line :-)

Sorry, I think phabricator has lost track of line numbers that some of the comments were attached to :-(

Thanks for the review!

So rather than introducing another variant that's likely to be "wrong", let's use rust-analyzer's protocol as-is?
I don't think leaving out "position" is a big deal as long as we're only providing one type of hint.

I do plan on working on other hint types (especially TypeHint) as a follow-up.

But we could still leave out "position", and let the client infer before vs. after based on the hint type (this what the Rust client does).

Yeah, this would be my preference (mostly because Rust does it and it's fewer fields, not because it's actually better).

I dug a bit into the differences based on:

Ranges/positions are very different! In particular, I think range has the same name+type but different semantics.
For the parameter hint:

foo(bar: 42);
The rust-analyzer protocol wants range = `foo([[42]]);`, label="bar: " position = before
The VSCode API wants range = `foo([[]]42);`, label="bar:", whitespaceAfter = true

This is a tough call because I think the rust-analyzer proposal is better (retains more info + flexibility like you say) but the VSCode one seems fairly likely to win :-/ We should provide feedback here.

Good catch. Digging into that PR a bit more, it seems like the reason for the range working in this way is that the C# language server has an "overwrite" mode for inline hints that you can enter, where the provided range is overwritten (temporarily) with the hint contents.

For example, if the hint range were [[auto]] x = 42;, in overwrite mode that would be rendered as int x = 42;.

In this model, it makes sense to make the range nonempty for some hint types (like the [[auto]] case), but not others (like parameter hints, where you wouldn't want the parameter name to replace the argument value).

So... if we're going to provide feedback here, we should probably suggest something that accommodates this use case. Perhaps an additional flag in the hint, to indicate if the "overwrite" behaviour is sensible for it? (Or perhaps the hint type is enough to infer that?)

Right. There are a few interlocking points here:

  • when inserting text, it's nice to be able to say what range it applies to (so it can appear "on demand") - do we have examples of editors where this interaction works well?
  • it's also nice to be able to either insert or overwrite (e.g. for expanding deduced types)
  • the mode/presentation is related to the hint type, but not in a way that predictably spans languages

One neat way to resolve this is to make position explicit, and have the allowed values be before/after/replace.

My other pieces of feedback would be

  • whitespaceBefore/whitespaceAfter doesn't belong in the protocol and can be inferred from the context/content.
  • it will be nice to eventually have good support for code actions/commands targeting these hints, as they're often for deduced info that can be explicit

(Happy to leave some version of this myself, but my "experience" is indirect :-))

clang-tools-extra/clangd/InlayHints.cpp
41

nit: This is *a* reason, but I think the names used in practice wouldn't be much help, either.
(Typically "A" or "LHS").

44

nit: you've written the skipped cases (constructor vs call) in different styles, unify?

68

A few more "obvious" cases to exclude: if there's exactly one parameter and the callee is (a memberexpr/declrefexpr named) "set*"?

If the callee is a copy/move constructor?

71

we may want to rather factor this as shouldHint(Expr *E, StringRef Hint) - I suspect we'll hit other cases where the hints are not useful and may add heuristics

82

consider E->IgnoreUnlessSpelledInSource() or E->IgnoreParenCasts() or one of the other (countless!) variants...

clang-tools-extra/clangd/InlayHints.h
10

maybe include a reference to the version being implemented, as there's likely to be more proposals soon

clang-tools-extra/clangd/XRefs.cpp
1979

Well, we'd need to include a standard library implementation in the tests (libc++ isn't part of the compiler).
initializer_list is not a built-in header, I don't know particularly why.

This appears to be enough to get init list construction to work:

namespace std { template <class> class initializer_list {}; }
2004

Yeah for varargs we should hint the fixed params, and then could choose to apply the vararg name to none/first/all of the varargs.
I'd suggest none, to be conservative - can always tweak later.

We don't seem to get here for any sort of invalid call. I guess that means we could probably replace this check with an assert, but given the complexity of C++ I always worry about edge cases

Agree it's better to just be a bit defensive here.
A simple option is just iterating up to min(actual args, num required params) and ignoring the rest.

2024

+1 to njames93 - people are using other tools (we don't have these features in review workflows... yet!)

e.g. Google has a style guide that encourages these, and having this feature in clangd won't change that. Given that people *shouldn't* remove the comment, showing the hint as well is annoying.

Eventually maybe the "overwrite" mode would be a good fit here (you can make a reasonable case for either the comment or the hint being preferable). But either way I think we need to detect them.
If detecting /*foo=*/ seems too specific, we could also bail out if we find any inline param comment...

2066

atan2(y: 1 + PATHOLOGICAL(2) + 4); // confusing! prefer no hint

Surely, people who write this deserve a confusing hint?

Yeah, on reflection this case isn't important, feel free to ignore it entirely.

Supporting idiomatic macro usage *is* useful but if the filtering/location logic gets too complicated feel free to defer with a FIXME.

clang-tools-extra/clangd/unittests/InlayHintTests.cpp
230

maybe add a FIXME test asserting no-hints-yet for aggregate initialization?
(not something to tackle in this patch, but seems like a fairly natural extension)

Hey, I just wanted to say thank you! This is a rust-analyzer feature I really like when writing Rust code but somehow never thought about bringing it to Clangd. This is a nice idea, huge thanks for implementing it!

nridge updated this revision to Diff 336774.Mon, Apr 12, 1:32 AM
nridge marked 18 inline comments as done.

Address most remaining review comments

I've left a couple of things as FIXMEs, as indicated

clang-tools-extra/clangd/XRefs.cpp
2024

Good points, thanks. I'd like to get to a stage where even review tools have these kinds of smarts, but we're not there yet.

I've left this as a FIXME for now (along with the setter-function case), but I'm happy to implement it in a follow-up (or in another revision of this patch if you prefer).

2059

I revised the code to avoid printName(), but using StringRef clashes with stripping leading underscores, so I kept it as std::string.

2066

I ended up implementing a simplified version of what you suggest:

  • I only check the ( (or rather, with CallExpr, only the ) since that's all it seems to expose), and assume the matching one is in the same place.
  • I don't worry about pathological cases where a macro overlaps an argument boundary and such.

The result does support both of your atan2 examples (which I agree are idiomatic and useful to support).

njames93 added inline comments.Mon, Apr 12, 3:01 AM
clang-tools-extra/clangd/ClangdLSPServer.cpp
1478

If this is compatible with rust analysers feature. Maybe we should be using their methods and feature flags spec. It will mean make life easier for clients wishing to support this before it gets baked into the official protocol.

sammccall accepted this revision.Mon, Apr 12, 4:06 AM
sammccall added inline comments.
clang-tools-extra/clangd/InlayHints.cpp
45

apparently user-defined literals are also CallExpr (what?) so I guess you want to exclude UserDefinedLiteral here

85

nit: SmallVector<8> seems like a clear (minor) win here

105

A MemberExpr with implicit-this should also not be hinted.

We may want to be looser than this, depending how much we value regularity vs brevity when it's obvious.
e.g. omit Foo: hint in

  • Bar::Foo (qualifiers present)
  • foo (case difference)
  • getFoo() etc

But we can certainly revisit this later.

105

can we also avoid printName here?

maybe change getDeclRefExpr to StringRef getSimpleExprName or so

152

nit: or C varargs

clang-tools-extra/clangd/XRefs.cpp
2059

using StringRef clashes with stripping leading underscores

How so? Name.ltrim('_') points to the same data as Name

This revision is now accepted and ready to land.Mon, Apr 12, 4:06 AM
nridge updated this revision to Diff 337055.Tue, Apr 13, 12:39 AM
nridge marked 5 inline comments as done.

Address remaining review comments

clang-tools-extra/clangd/ClangdLSPServer.cpp
1478

It's not 100% compatible. For example, for parameter hints, Rust expects the server to provide only the parameter name itself in label, and post-processes the labels on the client side to tack on the ": ".

We could make it 100% compatible. Do you think that would be valuable? That is, do you expect there will be cross-language client implementations of the pre-standard protocol, in places that are not a plugin shipped with a language server (like vscode-clangd or the rust-analyzer vscode plugin)?

(Would accommodating such client impls limit us from adding things like the suggested "implicit conversion hints"?)

clang-tools-extra/clangd/InlayHints.cpp
45

Good call, thanks!

105

Fixed the "MemberExpr with implicit-this" case.

As for the other suggestions: I can see the argument, but I think there's also value in simplicity ("the argument expression matches the hint exactly"), including for the purpose of not leaving the user wondering why we're not showing a hint.

152

As it happens, for C varargs, the notional "varargs parameter" doesn't count towards getNumParams() in the first place, so this function has (and needs) no logic for them.

clang-tools-extra/clangd/XRefs.cpp
2059

Just didn't think of that, thanks :-)

sammccall accepted this revision.Tue, Apr 13, 2:59 AM

Still LG

clang-tools-extra/clangd/ClangdLSPServer.cpp
1478

FWIW *I* don't think many pre-standard clients are likely or that important in the long run. Rather we should focus on proving the concept, getting a head start on the standard feature, being well-positioned to give feedback on standard proposals. So if it's not totally compatible, I like the clangd prefixes.

clang-tools-extra/clangd/InlayHints.cpp
152

Right - but maybe the caller doesn't know that :-)

nridge updated this revision to Diff 337342.Tue, Apr 13, 11:29 PM
nridge marked 2 inline comments as done.

Address final nit

This revision was landed with ongoing or failed builds.Tue, Apr 13, 11:31 PM
This revision was automatically updated to reflect the committed changes.

Btw., thanks for providing feedback on the LSP proposal!