This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Added highlighting for tokens that are macro arguments.
ClosedPublic

Authored by jvikstrom on Jul 15 2019, 7:10 AM.

Details

Summary

Adds semantic highlighting for tokens that are a macro argument.
Example:

#define D_V(X) int X = 1230
D_V(SomeVar);

The "SomeVar" inside the macro is highlighted as a variable now.

Tokens that are in a macro body expansion are ignored in this patch for three reasons.

  • The spelling loc is inside the macro "definition" meaning it would highlight inside the macro definition (could probably easily be fixed by using getExpansionLoc instead of getSpellingLoc?)
  • If wanting to highlight the macro definition this could create duplicate tokens. And if the tokens are of different types there would be conflicts (tokens in the same range but with different types). Say a macro defines some name and both a variable declaration and a function use this, there would be two tokens in the macro definition but one with Kind "Variable" and the other with Kind "Function".
  • Thirdly, macro body expansions could come from a file that is not the main file (easily fixed, just check that the Loc is in the main file and not even a problem if we wanted to highlight the actual macro "invocation")

Event Timeline

jvikstrom created this revision.Jul 15 2019, 7:10 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 15 2019, 7:10 AM
hokein added inline comments.Jul 16 2019, 2:28 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
201

It'd be clearer for the comment to describe which cases we are going to highlight (rather than the cases we don't).

207

The Loc here maybe not in the main file, considering the case

// in .h
#define DEFINE(X) in X;
#define DEFINE_Y DEFINE(Y)

// in .cc

DEFINE_Y

How should this behave when if the same token is used multiple times inside a macro and the uses are different? Could we add this to tests?
E.g.

#define W(a) class a { void test() { int a = 10; } }
W(foo); // <-- `foo` is a variable of a class?
jvikstrom marked 2 inline comments as done.

Address comments.

jvikstrom marked an inline comment as done.Jul 17 2019, 12:29 AM
jvikstrom added inline comments.
clang-tools-extra/clangd/SemanticHighlighting.cpp
207

The spelling loc is still going to be in DEFINE_Y I think. And we only highlight arguments in macros. (Added a testcase though)

jvikstrom marked an inline comment as done.

Fixed edge case when removing conflicting tokens.

clang-format

How should this behave when if the same token is used multiple times inside a macro and the uses are different? Could we add this to tests?
E.g.

#define W(a) class a { void test() { int a = 10; } }
W(foo); // <-- `foo` is a variable of a class?

I had completely missed that there could be conflicting tokens when only highlighting macro arguments as well. Added code to just remove conflicting tokens.

I had completely missed that there could be conflicting tokens when only highlighting macro arguments as well. Added code to just remove conflicting tokens.

Picking one of the highlightings looks fine, but we probably want to make sure it's deterministic. Given that we use sort now, I bet it won't be. Maybe include kinds into comparison as well? That's not a perfect solution, but would at least make sure the user-visible behavior is not random.
Could you add tests for that case too?

hokein added inline comments.Jul 17 2019, 2:13 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
207

ok, I don't have an exact case for macros now, but my gut feeling tells me we will encounter cases where the Loc is not in main file.

here is a test case for declarations, we will visit the foo() decl in test.h as well. This could be addressed in a separate patch.

// test.h
void foo();

// test.cc
#include "test.h"
void foo() {}

I had completely missed that there could be conflicting tokens when only highlighting macro arguments as well. Added code to just remove conflicting tokens.

Picking one of the highlightings looks fine, but we probably want to make sure it's deterministic. Given that we use sort now, I bet it won't be. Maybe include kinds into comparison as well? That's not a perfect solution, but would at least make sure the user-visible behavior is not random.
Could you add tests for that case too?

Already added the case you sent a comment on in the test case. (look at the top of it)
Don't understand what you mean with the sort though. Kinds are already included in the comparator for sort. After the call to unique the only tokens that will share ranges are the ones that have different kinds.

Could just have a custom comparator in the call to unique that only compares the tokens' ranges which would leave us with the token whose kind is the lowest when there are conflicting ones. But do we really want to highlight anything when there are conflicts?

Maybe we should add another kind of Kind for when a token is conflicting?

Thanks for pointing out I missed stuff. The strategy for conflicting highlightings LG, just a few NITs left from my side.

Already added the case you sent a comment on in the test case. (look at the top of it)

Ah, sorry, I missed it. There are many there and the one you added slipped my mind.

Don't understand what you mean with the sort though. Kinds are already included in the comparator for sort. After the call to unique the only tokens that will share ranges are the ones that have different kinds.

Ah, right. Sorry, missed that as well, sort() LG, the results are deterministic.

Could just have a custom comparator in the call to unique that only compares the tokens' ranges which would leave us with the token whose kind is the lowest when there are conflicting ones. But do we really want to highlight anything when there are conflicts?
Maybe we should add another kind of Kind for when a token is conflicting?

Not highlighting for conflicting kinds LG (there is one interesting case that I think we should support at the end of my comment, other than it looks ok).
Conflicting would be hard to explain to users, so not having it look like a better option.

Now, one interesting case that we should probably support is assert. Schematically, it does something like:

#define assert(COND) if (!(cond)) { fail("assertion failed" #COND); }
assert(x != y);

Despite the argument being mentioned twice, we definitely only want highlightings from the first occurrence (that would give us kinds for expressions).
I would expect that this already works, in which case it's just a matter of adding a test case to guard against future regressions.

If this does not work, we should probably fix it in a separate change.

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
195

Could you add a comment explaining that we choose to not highlight the conflicting tokens?

hokein added inline comments.Jul 17 2019, 5:20 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
207

ok, here is the case, the spelling loc is not in main file, it is in <scratch space>

// test.h
#DEFINE(X) class X {};
#DEFINE_Y(Y) DEFINE(x##Y)

// test.cc
DEFINE_Y(a);
jvikstrom marked 2 inline comments as done.

Ignore tokens outside of main file. Added testcase for assert.

jvikstrom added inline comments.Jul 17 2019, 10:13 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
207

You are correct. The other comment was actually correct as well, I just put the arguments for the header code in the place where the source code should be in.

Added a check in addToken that the Loc we are trying to add must be in the main file.

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
195

There is a comment in SemanticHighlighting.cpp at line 43. Want me to add a comment in the test case as well?

hokein added inline comments.Jul 18 2019, 4:54 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
46

we don't care the Kind in HighlightingToken now, I think we could simplify the code by tweaking the deduplication logic above?

llvm::sort(Tokens, [](const HighlightingToken &L, const HighlightingToken &R) {
                 return L.R < R.R;
               });
std::unique(Tokens.begin(), Tokens.end(), [](const HighlightingToken &L, const HighlightingToken &R) {
                 return L.R == R.R;
               });
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
195

The comment in SemanticHighlighting.cpp file is too far away, there is no harm to add comment here, indeed, it somehow improves the test code readability .

jvikstrom marked an inline comment as done.Jul 18 2019, 4:57 AM
jvikstrom added inline comments.
clang-tools-extra/clangd/SemanticHighlighting.cpp
46

This would still keep one entry of the conflicting token though. (If we have one Kind that is a variable and one that is a function. One of those tokens would still be in Tokens at the place they were conflicting as unique removes every element but one of duplicates)

If we have conflicting tokens we want to remove all of them because it doesn't really make sense to highlight them as anything.

jvikstrom updated this revision to Diff 210530.Jul 18 2019, 5:03 AM

Added comment saying conflicting tokens are not highlighted in the test.

jvikstrom marked an inline comment as done.Jul 18 2019, 5:04 AM
nridge added a subscriber: nridge.Jul 27 2019, 5:28 PM
jvikstrom updated this revision to Diff 214619.Aug 12 2019, 6:19 AM

Rebased to master.

jvikstrom updated this revision to Diff 214620.Aug 12 2019, 6:21 AM

Moved comment.

@ilya-biryukov @hokein pinging about this cl (I had completely forgotten about it somehow. Just updated it to the new master though)

ilya-biryukov added inline comments.Aug 19 2019, 2:59 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
55

This is potentially O(n^2). Could we instead create a new vector and fill it with the new items?

The memory usage should not matter much - we have the AST stored in the background anyway. I bet we would be looking at it first if we wanted to minimize memory usage.

If we really wanted to not waste any memory, we could do it std::erase_if-style, i.e. move the items we want to remove to the end of the vector, call vector::erase once at the end.

ilya-biryukov added inline comments.Aug 19 2019, 3:03 AM
clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
197

Unrelated to the change: do we plan to highlight macro names?
That seems very useful.

jvikstrom updated this revision to Diff 215907.Aug 19 2019, 8:23 AM
jvikstrom marked 2 inline comments as done.

Rewrote conflicting token removal code.

jvikstrom added inline comments.Aug 19 2019, 8:28 AM
clang-tools-extra/clangd/SemanticHighlighting.cpp
55

Completely forgot that erase is linear. Will go for the "allocate another vector" approach.

Rewrote the entire conflicting checking code to hopefully be simpler as well (it changed substantially, so you might want to take another look at it).

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
197

Yes

ilya-biryukov accepted this revision.Aug 19 2019, 9:11 AM
ilya-biryukov marked an inline comment as done.

LGTM from my side (and a few NIT, but up to you whether to apply them)

clang-tools-extra/clangd/SemanticHighlighting.cpp
66

NIT: Using a for-loop here could provide a little more structure and limit the scope of TokRef

for (ArrayRef<Token> Toks = Tokens; !Toks.empty(); ) {
  // ...
  Toks = Toks.drop_front(Conflicting.size());
}
80

NIT: the following version seems more readable: TokRef = TokRef.drop_front(Conflicting.size());

clang-tools-extra/clangd/unittests/SemanticHighlightingTests.cpp
197

Awesome, can't wait for it to land!

This revision is now accepted and ready to land.Aug 19 2019, 9:11 AM
This revision was automatically updated to reflect the committed changes.
jvikstrom marked 2 inline comments as done.
Herald added a project: Restricted Project. · View Herald TranscriptAug 19 2019, 9:27 AM