This is an archive of the discontinued LLVM Phabricator instance.

clangd SemanticHighlighting: added support for highlighting overloaded operators
AbandonedPublic

Authored by nridge on Feb 6 2022, 5:54 AM.

Details

Reviewers
iannisdezwart
Summary

Recently I switched from the default vscode Microsoft C/C++ extension to clangd because of performance reasons.
I found out clangd lacks one of my most liked features from the Microsoft extension: overloaded operators are not highlighted as functions.
This is a very essential semantic indicator, because overloaded operators can have very different functionality.
This patch adds syntax highlighting for the following patterns:

  1. operator calls to overloaded operators, e.g.
std::cout << 42; // `<<` is highlighted as if it were a function.
std::string s;
s = "Hello, World!"; // `=` ditto
some_map["key"] = "value"; // `[` and `]`
some_functor(1, 2, 3); // `(` and `)`
  1. function calls to overloaded operators in function notation, e.g.
operator<<(std::cout, 42); // `operator<<`
some_ptr->some_field.operator++(); // `operator++`
operator delete[](ptr) // `operator delete[]`
  1. any reference to an overloaded operator function, e.g.
const auto &fn = some_struct.operator<<; // `operator<<`

Before:

After:

Diff Detail

Event Timeline

iannisdezwart created this revision.Feb 6 2022, 5:54 AM
iannisdezwart requested review of this revision.Feb 6 2022, 5:54 AM

Fixed a bug I introduced by removing a null-check. I added the null-check back in again.

nridge added a comment.Feb 7 2022, 1:22 AM

Thanks for the patch!

Could you add some tests to clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp as well?

I wrote a bit about how the tests work on Discord, copying for convenience:

In SemanticHighlightingTests.cpp, we have code inputs which are annotated with the highlighting kinds (and in some tests, modifiers) that various tokens should get.
You can build the test suite with ninja ClangdTests
Then you can run an individual test case using a command like ./tools/clang/tools/extra/clangd/unittests/ClangdTests --gtest_filter=SemanticHighlighting.MyTestName

nridge added a comment.Feb 7 2022, 1:54 AM

I haven't looked at the patch in detail, but one high level question: have you considered the possibility of adding these highlightings during the findExplicitReferences phase, rather than in CollectExtraHighlightings? (I haven't thought through whether that would work, just wondering if you have. The reason this is worth asking is that if we can get findExplicitReferences to handle overloaded operator calls, other clangd features that use findExplicitReferences would benefit from knowing about such calls as well.)

clang-tools-extra/clangd/SemanticHighlighting.cpp
659

This FIXME was specifically about highlighting mutable reference arguments for overloaded operator calls, so it remains unfixed. (I realize that wasn't super clear.)

Two changes:

  1. Added tests for overloaded operators into SemanticHighlightingTests.
  2. Fixed the way tokens are expanded. Previously operator<< would be split into two highlighting tokens: operator and <<. Now it's all one highlighting token. The same is true for all other compound operator keyword statements.
iannisdezwart accepted this revision.EditedFeb 7 2022, 6:25 AM

I haven't looked at the patch in detail, but one high level question: have you considered the possibility of adding these highlightings during the findExplicitReferences phase, rather than in CollectExtraHighlightings? (I haven't thought through whether that would work, just wondering if you have. The reason this is worth asking is that if we can get findExplicitReferences to handle overloaded operator calls, other clangd features that use findExplicitReferences would benefit from knowing about such calls as well.)

I have in fact looked at the possibility of adding the code into findExplicitReferences, but I figured it would be more suitable to add it into CollectExtraHighlightings, because it is easier to distinguish declarations from references. In findExplicitReferences the declaration will be traversed multiple times, so it would be a pain to check if it had already been traversed. I found it was relatively easy to write code that would work in CollectExtraHighlightings, and I also don't know for sure if it's even possible to handle all cases in findExplicitReferences.

Edit: oops, I see I accepted my own revision, uhhhh that was not my intention. How can I undo that?

Also, I think I misunderstood your question, I think putting this code into findExplicitReferences for the non-declaration overloaded operator references would make a lot of sense. It might be a good idea to do this (in an other commit?)

Edit 2: actually now that I think of it, this functionality is already implemented in findExplicitReferences. But for highlighting this causes problems since it only works for a single token, and operator tokens are compound. This is why canHighlightName returns false for overloaded operators, so they will not be highlighted "normally". This patch fixes this by adding code to CollectExtraHighlightings.
In short: findExplicitReferences already contains the functionality you were referring to!

This revision is now accepted and ready to land.Feb 7 2022, 6:25 AM

Fixed formatting & reverted a deleted FIXME comment.

iannisdezwart marked an inline comment as done.Feb 16 2022, 7:48 AM
iannisdezwart requested review of this revision.Feb 17 2022, 6:08 AM

Would be nice if this can further be reviewed and commited to the repo!

Still haven't had a chance to take a super detailed look, but a few high-level thoughts:

  1. The AST nodes that reference operator names should store source ranges associated with the operator names, such that we shouldn't need to do token-by-token manipulation. For example, functionDecl->getNameInfo().getCXXOperatorNameRange().
  2. I think I appreciate now the issue with findExplicitReferences(): it gives us ReferenceLoc objects, which store a single SourceLocation but do not retain the AST node containing the reference (thereby not allowing us to access the source range of the reference easily). We've run into this before (here), and contemplated modifying ReferenceLoc to retain the AST node containing the reference, but decided against it, so I guess we should stick to that. That said... would it make sense to handle the single-token cases with findExplicitReferences()? Or would that just end up splitting the logic / duplicating code more?
  1. The AST nodes that reference operator names should store source ranges associated with the operator names, such that we shouldn't need to do token-by-token manipulation. For example, functionDecl->getNameInfo().getCXXOperatorNameRange().

Fleshing this out a bit more:

  • For FunctionDecl, getNameInfo().getCXXOperatorNameRange() is almost the right thing, but it excludes the operator keyword itself. We know getLocation() was giving us the location of the operator keyword, so we can construct the source range we want as SourceRange(getLocation(), getNameInfo().getCXXOperatorNameRange().getEnd()).
  • For DeclRefExpr, we can do the same thing, using DeclRefExpr::getNameInfo().
  • For MemberExpr, same thing using MemberExpr::getMemberNameInfo() (and getMemberLoc() as the start of the range).
  • For CXXOperatorCallExpr, it seems to only handle single-token cases so we should be good to continue using just getOperatorLoc()

I believe that should allow us to avoid token manipulation altogether.

  1. [...] would it make sense to handle the single-token cases with findExplicitReferences()? Or would that just end up splitting the logic / duplicating code more?

Thinking more about this, I don't think it would make sense. For example, it's not the case that all references to overloaded operators could be handled by findExplicitReferences() (there are multi-token reference cases like explicit operator calls). So, let's just keep using CollectExtraHighlightings for all cases here.

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
815

I'm going to suggest that we exercise a few more implicit-call cases:

  • single-token, e.g. foo << 1
  • operator(), e.g. foo()
  • operator[], e.g. foo[1]

(Very interestingly, in the implicit () and [] cases, the opening one seems to be colored by VisitDeclRefExpr but the closing one by VisitCXXOperatorCallExpr. Not quite sure why that is.)

Thank you for the comment, it's very helpful.

I didn't know exactly how to get all operator token locations and getCXXOperatorNameRange().
This is going to be extremely helpful and will clean up the code to make it more readable.
Thank you for the detailed list of how to get the full operator name token location for each specific token type.

About your comment on the implicit calls to operator[] and operator(),
this was also a workaround. Do you have any idea how to get both parentheses from a single DeclRefExpr?
I researched this, but haven't yet found the answer.

I will try incorporating these changes today or tomorrow, if I can find time.

About your comment on the implicit calls to operator[] and operator(),
this was also a workaround. Do you have any idea how to get both parentheses from a single DeclRefExpr?
I researched this, but haven't yet found the answer.

Looking at the AST for some example code like this:

struct S {
  void operator()(int);
};

int main() {
  S s;
  s(2);
}

the AST for the line with the implicit call looks like this:

`-CXXOperatorCallExpr 0x9f9ec0 <line:7:3, col:6> 'void' '()'
  |-ImplicitCastExpr 0x9f9ea8 <col:4, col:6> 'void (*)(int)' <FunctionToPointerDecay>
  | `-DeclRefExpr 0x9f9e58 <col:4, col:6> 'void (int)' lvalue CXXMethod 0x9f9588 'operator()' 'void (int)'
  |-DeclRefExpr 0x9f9e18 <col:3> 'S' lvalue Var 0x9f97c0 's' 'S'
  `-IntegerLiteral 0x9f9e38 <col:5> 'int' 2

Note the CXXOperatorCallExpr spand the s(2) (columns 3 through 6), and DeclRefExpr for the call operator spans the (2) (columns 4 through 6).

The current impl uses DeclRefExpr::getLocation() to grab the (, and CXXOperatorCallExpr::getOperatorLoc() to grab the ). The latter works because getOperatorLoc() is documented to behave this way:

/// When \c getOperator()==OO_Call, this is the location of the right
/// parentheses; when \c getOperator()==OO_Subscript, this is the location
/// of the right bracket.

I've proposed that we switch to using DeclRefExpr::getNameInfo().getCXXOperatorNameRange() as the range to highlight. It now occurs to me that for implicit ( ) and [ ], this is likely to return a range that includes the arguments as well, which we don't want. So, we may need to check implicit vs. explicit, and for implicit highlight the start and end tokens separately. Two potential ways for how to do that:

  1. Highlight DeclRefExpr::getLocation(). Check if DeclRefExpr::getEndLoc() is different, and if so highlight that too.
  2. Same as the current patch, continue visiting CXXOperatorCallExpr, highlight CXXOperatorCallExpr::getOperatorLoc(), and for ( and [ also highlight DeclRefExpr::getLocation().
nridge requested changes to this revision.Feb 28 2022, 12:24 AM

One final note: many of the methods I've suggested produce a SourceRange, and we'll need to convert it to Range. I think it should be fine to convert the begin and end SourceLocations separately using getRangeForSourceLocation(), which will give you two Ranges B and E, sanity-check that B <= E, and take (B.begin(), E.end()) as the final range.

That's all I've got for now. I may have follow-up suggestions about code organization / reuse (e.g. factoring out the modifier-setting code), but that will be easier to reason about once the changes discussed so far have been made.

Thanks again for your efforts on this!

This revision now requires changes to proceed.Feb 28 2022, 12:24 AM

@iannisdezwart Hi :) Do you plan to continue to work on this? It seems like we're most of the way there!

Herald added a project: Restricted Project. · View Herald TranscriptMay 23 2022, 11:19 PM
nridge commandeered this revision.Dec 14 2022, 12:21 AM
nridge removed a reviewer: nridge.
This revision now requires review to proceed.Dec 14 2022, 12:21 AM
nridge abandoned this revision.Dec 14 2022, 12:21 AM

Abandoning in favour of D136594