This is an archive of the discontinued LLVM Phabricator instance.

Add semantic token modifier for non-const reference parameter
ClosedPublic

Authored by tom-anders on Aug 18 2021, 12:36 PM.

Diff Detail

Event Timeline

tom-anders created this revision.Aug 18 2021, 12:36 PM
tom-anders requested review of this revision.Aug 18 2021, 12:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2021, 12:36 PM

Thanks -- this patch is looking great so far!

clang-tools-extra/clangd/SemanticHighlighting.cpp
312

We should consider the case where a dependent name is passed by non-const reference, for example:

void increment_counter(int&);

template <typename T>
void bar() {
   increment_counter(T::static_counter);
}

This case does not work yet with the current patch (the dependent name is a DependentScopeDeclRefExpr rather than a DeclRefExpr), but we'll want to make it work in the future.

With the conflict resolution logic in this patch, the Unknown token kind from highlightPassedByNonConstReference() will be chosen over the dependent token kind.

As it happens, the dependent token kind for expressions is also Unknown so it doesn't matter, but perhaps we shouldn't be relying on this. Perhaps the following would make more sense:

  1. a token with Unknown as the kind has the lowest priority
  2. then a token with the DependentName modifier (next lowest)
  3. then everything else?
535

nit: sense

550
562

I think there are some edge cases where isConst() doesn't do what we want.

For example, I think for a parameter of type const int*&, it will return true (and thus we will not highlight the corresponding argument), even thus this is actually a non-const reference.

So, we may want to use a dedicated function that specifically handles reference-related logic only.

(This probably also makes a good test case.)

567

For a qualified name (e.g. A::B), I think this is going to return the beginning of the qualifier, whereas we only want to highlight the last name (otherwise there won't be a matching token from the first pass).

So I think we want getLocation() instead.

(Also makes a good test case.)

Agree this is nice, well done! A few more notes for consideration...

clang-tools-extra/clangd/SemanticHighlighting.cpp
312

The conflict-resolution idea is subtle (and IME hard to debug). I'm wary of overloading it by deliberately introducing "conflicts" that should actually be merged. Did you consider the idea of tracking extra modifiers separately and merging them in at the end?


BTW: we're stretching the meaning of Unknown here. There are two subtly different concepts:

  • clangd happens not to have determined the kind of this token, e.g. because we missed a case (uses in this patch)
  • clangd has determined that per C++ rules the kind of token is ambiguous (uses prior to this patch)

Call me weird, but I have "Unknown" highlighted in bright orange in my editor, because I want to know about the second case :-)

534

CXXOperatorCallExpr seems to make sense to me for the most part:

  • functor calls are CXXOperatorCallExpr
  • if x + y mutates y, I want to know that

There are some exceptions though:

  • the functor object itself is more like the function callee, and to be consistent we shouldn't highlight it
  • highlighting x in x += y is weird
  • a << b is a weird ambiguous case ("stream" vs "shift" want different behavior)

I think these can be handled by choosing a minimum argument index to highlight based on the operator type.

I think it makes sense to leave operators out of scope for this patch, but IMO should be a "FIXME" rather than a "let's never do this" :-)

548

nit: this function name is a bit hard to parse, "highlight" is a transitive verb but the rest of the name isn't its object.

Maybe highlightMutableReferenceArguments?

558

I feel like you'd be better off using the FunctionProtoType and iterating over argument types, rather than the argument declarations on a particular declaration of the function.

e.g. this code is legal in C:

int x(); // i suspect this is the canonical decl
int x(int); // but this one provides the type

We don't have references in C of course!, but maybe similar issues lurking...

562

Yeah this is the core of this modifier, worth being precise and explicit here. I think we want to match only reference types where the inner type is not top-level const.

I think we should also conservatively forbid the inner type from being *dependent*. Consider the following function:

template <typename X>
void foo(X& x) {
  foo(x); // this call
}

Locally, the recursive call looks like "mutable reference to dependent type". But when instantiated, X may be const int and is in fact very likely to be deduced that way in practice.

566

You may want to add an "unwrapping" step so that e.g. ArraySubscriptExpr a[i] can be unwrapped to a and then possibly highlighted (and its operator-overloaded form).

Wouldn't suggest doing that in this patch, but you could leave a note if you like.

567

And getLocation() will do the right thing for DeclRefExpr, MemberExpr, and others, so this can just be isa<DeclRefExpr, MemberExpr> with no need for dyn_cast.

clang-tools-extra/clangd/SemanticHighlighting.h
73

nit: no line break here. The line breaks separate normal modifiers vs scope modifiers (which are mutually exclusive) vs sentinels

74

nit: "mutable" is more natural than "non-const", and modifiers don't use heavily C++-specific names anyway

Please spell out "reference", we don't use other abbreviations.

I'd consider "UsedAsMutableReference" rather than "passed" as this seems to be the deeper semantic idea we're expressing, and we may want to generalize this in some way

nridge added inline comments.Aug 19 2021, 10:57 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
312

I don't have a strong opinion on the options here, just wanted to chime in and say I also highlight Unknown prominently for similar reasons :)

tom-anders planned changes to this revision.Aug 19 2021, 11:31 AM
tom-anders added inline comments.
clang-tools-extra/clangd/SemanticHighlighting.cpp
567

I'm not sure which getLocation() you're talking about here. There's DeclRefExpr::getLocation(), but neither Expr::getLocation() nor MemberExpr::getLocation(). Am I missing something?

sammccall added inline comments.Aug 19 2021, 11:37 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
567

No, I think I'm just going mad (I was thinking of Decl::getLocation I guess).
Never mind and sorry!

tom-anders added inline comments.Aug 19 2021, 12:10 PM
clang-tools-extra/clangd/SemanticHighlighting.cpp
558

I'm not really sure how to get from the CallExpr to the FunctionProtoType, can you give me a hint?

567

np :D

nridge added inline comments.Aug 19 2021, 5:52 PM
clang-tools-extra/clangd/SemanticHighlighting.cpp
558

I think FD->getType()->getAs<FunctionProtoType>() should do it

tom-anders marked 12 inline comments as done.Aug 23 2021, 12:35 PM
tom-anders added inline comments.
clang-tools-extra/clangd/SemanticHighlighting.cpp
312

I switched to @sammccall's idea of having a map<Range, Modifier>, I agree that it feels more natural than the previous version

534

Agreed, added a FIXME

562

I adjusted the condition accordingly and added a few more test cases (including stuff like const int*&).

566

Added a FIXME.

567

Switched to getLocation() and added a test case.

tom-anders marked 3 inline comments as done.
  • Apply suggested renaming and fix nits
  • Use a map for extra modifiers instead of abusing conflict resolution
  • Use ArrayRef instead of callback
  • Add FIXME for handling CXXOperatorCallExpr
  • Use getLocation() instead of getBeginLoc()
  • Add FIXME for unwrapping operators
  • Iterate over FunctionProtoType instead of FunctionDecl
  • Adjust highlighting condition and revise test cases
  • Remove accidentally added empty line
  • clang-format once again

@sammccall @nridge Thanks for the positive feedback, let me know if I missed any of your suggestions!

Thanks for the update! Looks pretty good, just a few minor comments from me. Will leave approval to Sam.

clang-tools-extra/clangd/SemanticHighlighting.cpp
470

Looking at existing usage of associative containers in clangd code, llvm::DenseMap and llvm::DenseSet (which are hashtable-based) are more commonly used than std::map and std::set, though it may require some boilerplate to teach them to use new key types (example)

We could also consider making the value just a vector instead of a set (maybe even an llvm::SmallVector<HighlightingModifier, 1> given its anticipated usage), since even in the unlikely case that we get duplicates, addModifier() will handle them correctly

553

nit: declarations --> types

554

nit: parameters

560

I think we can relax the dependent type check here a bit, for example std::vector<T>& is a dependent type but it's definitely a mutable reference.

(Feel free to leave that for later and just add a FIXME.)

566

Maybe also add a // FIXME: Handle dependent expression types just so we don't forget

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
773

(test case needs uncommenting)

tom-anders marked 5 inline comments as done.Aug 28 2021, 4:00 AM
tom-anders added inline comments.
clang-tools-extra/clangd/SemanticHighlighting.cpp
470

Agreed, changed std::set to llvm::SmallVector. I can't really judge if it's worth using llvm::DenseMap here though.

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
773

Actually it needed to be removed, has been replaced by the test above

tom-anders marked an inline comment as done.
  • Use llvm::SmallVector, add more FIXME, remove old commented out test

Thanks for the updates! LGTM from my side.

clang-tools-extra/clangd/SemanticHighlighting.cpp
555

nit: still a typo here ("paramteres")

562

nit: typo (idDependentType)

sammccall accepted this revision.Sep 6 2021, 3:05 AM

Thanks! This looks good to me to land as is.
I'd like to see the std::map eliminated, but if you don't want to do this I'm happy to take a stab at it after.

Want someone to commit this for you?

clang-tools-extra/clangd/SemanticHighlighting.cpp
470

We do try to avoid std::map/std::set as they're pretty gratuitous performance sinks. And the map here may not be tiny.

I'd suggest either:

  • make this a flat vector<pair<Location, HighlightingModifier>>, sort it after building, and use binary search to find the right place. This has the same big-O as std::map but without all the allocations and pointer chasing
  • Specializing DenseMapInfo<Location> in protocol.h and using DenseMap<Location, SmallVector<HighlightingModifier,1>>

We don't really need the full Range in the key, as the start defines it.

We could also consider making the value just a vector instead of a set

In practice these sets of modifiers end up being a bitmask, so if using a map, you could just store a uint32_t of 1<<Modifier and add an addModifiers(uint32_t) to HighlightingToken. I'm too really worried about preserving that encapsulation.

This revision is now accepted and ready to land.Sep 6 2021, 3:05 AM

Yeah, I‘d need someone to commit this for me.

About the std::map thing: let’s proceed as suggested and have @sammccall take look at it himself after :)

I'm happy to commit it. Did you want to make the SmallVector<1> --> uint32_t change, or should I just commit it as is?

I’d say commit as is and have a look at the SmallVector -> unit32_t thing together with the std::map issue afterwards

This revision was automatically updated to reflect the committed changes.

Apologies, forgot about this. Commited now :)